NIH Cohort Retrieval

Project Overview

Screenshots from the patient cohort retrieval system.

Electroencephalography (EEG) records and measures the electrical activity of the brain. The collected information is used to diagnose brain disordersand diseases, such as epilepsy and Alzheimer's disease. EEG is also used in research fields such as neuroscience, medicine, and bioengineering. The signals generated by EEG are complex and their interpretations vary by their observer which can lead to inconsistent diagnoses and slow down research progression.

Currently, there is an absence of enabling search technologies able to efficiently operate on EEG data resources because:

1) EEG signals and reports are difficult to automatically process and analyze.

2) The difficulty in automatically process patients based on their EEG reports and signals.

Electronic medical records (EMRs) contain unstructured text, temporally constrained measurements (e.g., vital signs), multichannel signal (e.g., EEGs), and image data (e.g., MRIs).

We are developing a patient cohort retrieval system that allows clinicians to retrieve relevant EEG signals and EEG reports using standard queries, such as:

“Young patients with focal cerebral dysfunction who were treated with Topamax”

To do this, we develop automated techniques to discover and align the underlying EEG events that led to a diagnosis using data-driven approaches and semi-supervised learning. Several iterations with specialized neurologists from the Temple University Hospital (TUH) revealed that useful events to be targeted are: Spike and Sharp Waves (SPSW), Generalized Periodic Epileptiform Discharges (GPEDs), Periodic Lateralized Epileptiform Discharges (PLEDs), Seizures (SEIZs). Our technology additionally recognizes events such as Eye Movements (EYEMs), Artifacts (ARTFs) and background (BCKG).

The EEG signal recognition system uses three levels of processing for the event recognition: Hidden Markov Models (HMM) to perform sequential decoding per EEG channel, Deep learning to add spatial and temporal context to differentiate between periodic and isolated events and a statistical language model to model event sequences.

In addition to the EEG signals, we take advantage of the EEG reports that describe the EEG recordings, which have dense spatial and temporal information associated with the described events. We mine these reports automatically in order to identify spatial and temporal expressions, several types of medical concepts, tests and treatments .

Operating on the knowledge of the events that are present in the EEG signal and the medical concepts automatically discovered through the processing of the medical reports, we develop an automatic patient cohort retrieval system. This system, based on MapReduce, is designed to search free-text chart notes and EEG signals.

The ability to find groups of patients that present similar conditions is a very important advantage for tasks such as the training of medical students who wish to study several variations of similar cases, decision support for practicing neurologists when prescribing medications or treatments and different types of research studies.

The technology that we develop allows the user to, not only find patient cohorts, but also visualize the time-aligned annotations in the EEG signals and the annotated medical reports for each EEG session.