In June 2014, the US Food and Drug Administration (FDA) launched a technology-driven "openFDA" initiative to make it easier to access and interpret the agency's publicly available data.
The openFDA initiative was originally born from an order from the White House, which called for federal agencies to implement a new "digital strategy" to make government information more accessible to the public.
The strategy focused on creating APIs – not active pharmaceutical ingredients, but application program interfaces – that would allow developers to access raw data from agencies and integrate them into new, easy-to-use software applications.
Two years later, openFDA launched, allowing users to interact with an ever growing list of APIs, including APIs for adverse drug events, medical device reports, enforcement reports and approved drug labeling.
The launch was hailed as a success by the blog API Evangelist, which said openFDA delivers "the best first impression possible, frictionless on-boarding, and the required education you will need to put [it] to work."
Speaking at the Big Data in Biomedicine conference at Stanford University in May 2015, FDA Chief Health Informatics Officer Taha Kass-Hout said openFDA received 100 queries per second in its first day, and hit one million queries a few days later.
One Year On
During his presentation, Kass-Hout said the primary question driving openFDA is "How can we spur innovation, not only within FDA, but also in the community?"
Part of the answer to that question seems to be by making vast datasets available to the public. "We start with already public data, but we prioritize based on interest," Kass-Hout said, and pointed to some of the datasets already available:
- 68,000 standard product labels (SPL) for drugs
- 3.9 million records for Manufacturer and User Facility Device Experience data (MAUDE), 1991 through 30 April 2015
- 41,000 Recall Enterprise System (RES) records from 2012 to 1 May 2015
- 4.6 million adverse events for drugs from 2003 to 30 June 2014
Part of openFDA's ongoing success comes from the agency's active engagement with the developer community on GitHub, StackExchange and Twitter. Kass-Hout also points out that much of the progress made with openFDA comes from outside the agency, saying the "top 40 developers of openFDA are not FDA employees; they are community members from all around the world."
Much of this effort involves data harmonization:
"The raw data we deal with is multi-format, some even have archaic formats that are no longer supported by the community, so we look at all these data sets and we convert all those into open and common standard [formats]."
Kass-Hout says this process of cleaning and linking the data in this effort is "perhaps the most important step in the entire process in openFDA." On top of cleaning and linking data, Kass-Hout says adding metadata to records can correct errors such as misspelled names in consumer-generated adverse event reports.
Kass-Hout also says this process can add missing information to records:
"For enforcement reports or recalls we use data mining techniques to extract NDC codes and UPC codes. Oftentimes UPC codes or the barcode is missing so what we do is scour google images to find high-res images of the boxes where we extract the UPC code and add it back to the recall record."
According to Kass-Hout, more than 20,000 unique IP addresses have connected to openFDA, and the site has more than 6,000 registered API users. Additionally, of the eleven million API calls the site has received, more than half come from outside the US.
As of May, he says FDA is aware of 30 new software applications built or in development with openFDA. One such application, genderedreactions.com, was "built over one weekend by a second-year medical student at Leeds University in England," and allows users to see differences in adverse events in drugs by gender.
openFDA: The First Year in Perspective