Unstructured data sources such as adverse event reports, patient surveys, publications, and social media carry important information about medical product safety. Two primary challenges with unstructured sources are the nuances of written language (misspellings, alternative ways of expressing the same concepts) and keeping up with the data volume as new records are created.
Leveraging techniques from natural language processing (NLP) and machine learning take us towards the goal of extracting meaning from text. Advances in computer hardware (availability of faster/more processors) along with the flexibility of cloud architectures — in particular, the ability to split intensive data processing tasks amongst multiple nodes on an on-demand basis — have greatly increased the amount of data that can be processed within a reasonable time frame, simultaneously reducing the overall cost requirement.
The Boomerang NLP Framework is Pharm3r’s in-house solution designed to support efficient natural language processing in the cloud. Supplied with arbitrary medical text, it is capable of extracting several important data types, including references to medical products (drugs and devices), business entities (such as manufacturers, distributors, hospitals, and research institutions), adverse events, patient indications (pre-conditions), and product issues (such as design defects, issues with manufacturing quality control, and malfunctions occurring in the field). Extracted terms can be expressed in a variety of standard medical terminologies, including MedDRA, SNOMED CT, and ICD 9/10.
We’re pleased to announce that a patent has just been granted on Boomerang by the U.S. Patent Office. We’re always on the look-out for new applications for Boomerang — do you have an unstructured data problem we can solve?