We show how topic modeling techniques can be used in conjunction with the ClarityNLP platform to extract patient-level features that are unlikely to appear in structured data (e.g., sociodemographic characteristics; socioeconomic characteristics; history of substance abuse) from unstructured clinical notes. These features can be used as input during the development of patient-level predictive models, and/or to identify patients that are eligible for participation in clinical trials.

Describe the new knowledge and additional skills the participant will gain after attending your presentation.: Attendees will better understand how to leverage unsupervised and semi-supervised, human-in-the-loop machine learning techniques, such as topic modeling, to extract information from unstructured clinical notes for the purpose of feature engineering and/or iterative computational phenotype development.


Christine Herlihy (Presenter)
Georgia Tech Research Institute

Charity Hilton, Georgia Tech Research Institute
Richard Boyd, Georgia Tech Research Institute
Trey Schneider, Georgia Tech Research Institute
Chirag Jamadagni, Georgia Tech Research Institute
Jon Duke, Georgia Tech Research Institute

Presentation Materials: