Electroencephalography (EEG) Resources

 

Corpora: IBMFT | TUEG | TUAB | TUAR | TUEP | TUEV | TUSZ | TUSL
Software: ANNO | DEMO | EVAL | HEAD | LABL | PYPR | PYST | MEDF | EDFB | PYED
Documentation: ELEC | ANNO | FNAM | EDFH | LBLS | TUTO
Instructions: RSYN | WGET | DISK
What's New: 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015

To request access to the TUH EEG Corpus, please fill out this form. You will receive an automatically-generated username and password via email. Data collected is unencumbered and can be used for both research and commercialization purposes.

The TUH EEG Corpus is freely available. The only reason we require registration is that we need to track who downloads the data. We also want to be able to inform you of any updates to the releases.

Once you have obtained the username and password, you can selectively download portions of the corpus using your browser. You can also use the rsync interface described below.


Corpora

  • IBM Features For Seizure Detection (IBMFT): Features developed by IBM to support deep learning experiments on seizure detection. More information about this work can be found here.

  • The TUH EEG Corpus (TUEG): A rich archive of over 30,000 clinical EEG recordings collected at Temple University Hospital (TUH) from 2002 - present. Read this journal paper for a more complete description of the corpus.

  • The TUH Abnormal EEG Corpus (TUAB): A corpus of EEGs that have been annotated as normal or abnormal. Read Silvia Lopez's MS thesis for a description of the corpus.

  • The TUH EEG Artifact Corpus (TUAR): This subset of TUEG that contains annotations of 5 different artifacts: (1) eye movement (EYEM), (2) chewing (CHEW), (3) shivering (SHIV), (4) electrode pop, electrode static, and lead artifacts (ELPP), and (5) muscle artifacts (MUSC).

  • The TUH EEG Epilepsy Corpus (TUEP): This is a subset of TUEG that contains 100 subjects epilepsy and 100 subjects without epilepsy, as determined by a certified neurologist. The data was developed in collaboration with a number of partners including NIH.

  • The TUH EEG Events Corpus (TUEV): This corpus is a subset of TUEG that contains annotations of EEG segments as one of six classes: (1) spike and sharp wave (SPSW), (2) generalized periodic epileptiform discharges (GPED), (3) periodic lateralized epileptiform discharges (PLED), (4) eye movement (EYEM), (5) artifact (ARTF) and (6) background (BCKG).

  • The TUH EEG Seizure Corpus (TUSZ): This corpus has manually EEG signals that have been carefully annotated data for seizures. For more information about this corpus, please refer to our book section. Our annotation guidelines are described here.

  • The TUH EEG Slowing Corpus (TUSL): This is another subset of TUEG that contains annotations of slowing events. This corpus has been used to study common error modalities in automated seizure detection.




Software

  • NEDC Annotation Tool (ANNO): A tool that allows rapid annotation of EEG signals. The tool includes spectrogram and energy plots, and is capable of transcribing data in real time. Learn more about this tool from our IEEE SPMB 2018 paper.

  • NEDC Real-Time Seizure Detection Tool (DEMO): A tool that is capable of detecting seizures in real time from streamed EEG data.

  • NEDC Eval EEG (EVAL): A Python-based scoring package that implements a variety of standard evaluation metrics. A complete description of the software can be found here.

  • NEDC Print Header (HEAD): This distribution contains two simple programs, nedc_print_header and nedc_print_signal, that can be used to view EDF file information. For example, print_header can be used with grep to view specific information about a field in the header across all files in the database.

  • NEDC Print Labels (LABL): A Python-based software package that demonstrates how to read our annotation files by loading files and displaying the annotations in a hierarchical format.

  • NEDC Python Print EDF Header and Signal (PYPR): Demonstration software that describes how to properly read EDF files and print the header and signal data. This software contains a robust implementation of a function to read an EDF header.

  • NEDC Python Streaming Software (PYST): Demonstration software that describes how to properly read EDF files. Please read our document describing the organization of the electrode data in an EDF file here to understand why this software is critical to your ability to correctly process EEG data.

  • MATLAB EDF (MEDF): MATLAB code that loads EEG signal data from an EDF file.

  • EDF Browser (EDFB): An open-source program that can be used to view files such as EEG, EMG, ECG, etc., available for Windows and Linux.

  • Python-based EDF (PYED): A Python interface to EDFLib that lets you read and write EDF files (the distribution format for TUH EEG).



Documentation

  • Electrodes (ELEC): A document that describes how EEG signals are stored in a multichannel signal file format. This document also includes a description of the channel labels, which are required to properly decode the data.

  • Annotations: (ANNO): A document that describes how we annotate seizures and store the annotations in various file formats.

  • Filename Conventions (FNAM): A document that describes the structure of filenames and directories in our corpora.

  • EDF Header (EDFH): A byte-by-byte description of the data stored in an EDF header.

  • Labels (LBLS): A spreadsheet describing the labels used to annotate our seizure detection corpus (TUSZ).

  • Tutorials (TUTO): Two of our early tutorials describing EEG signals and the annotation problem.



Instructions

All of our released corpora are now available one of four ways:

  1. From the web at:

          https://www.isip.piconepress.com/projects/tuh_eeg/downloads/

    You can directly browse the directories and explore the data. This is convenient if you want to sample the data and explore formats, content, etc.

    The username and password are the same as what you use to access the web-based version of these resources. If you do not have the username and password, register by filling out this form and you will receive this information automatically by email.

  2. Rsync, which is available on Linux and Mac platforms, is our preferred way of downloading data. It allows you to easily keep your copy of the data in sync with ours.

    Windows users can get access to rsync by installing MobaXterm.

    A typical rsync command to download a specific release (e.g., v1.5.1) of a specific corpus (TUSZ) is:

          rsync -auxvL nedc@www.isip.piconepress.com:data/tuh_eeg_seizure/v1.5.2/ .

    Note that the "." at the end of this command is very important since it denotes the destination directory. Without a destination directory specification, the command will not transfer any data.

    The username and password are the same as what you use to access the web-based version of these resources. If you do not have the username and password, register by filling out this form and you will receive this information automatically by email.

    Note that the "-L" option in rsync instructs it to follow links. All of our corpora are linked back to TUEG. If you are downloading the entire suite of corpora, you do not need to use "-L". If you are downloading only one corpus, you need to use "-L".

    Long rsync jobs will often fail because you will lose your network connection. Therefore, we usually wrap long rsync transfers in a script, nedc_rsync.sh. This script will repeatedly run rsync until there is no more data to be transferred.

  3. If Internet connectivity is a problem, you can send us a 4T USB drive. We will copy the data to this disk and send it to you. You must arrange for postage as described below. If you elect this option, you need to send us a 4T USB drive and provide a UPS or FedEx account number for return shipping.

    Please send us a conventional USB-mounted disk drive. We have had problems with other types of media such as thumb drives. Any standard USB-powered USB 2.0 compatible 4T drive, such as a Western Digital or Seagate, will work fine. Because of the time it takes top copy the data, we need a drive that can maintain a stable connection, and thumb drives have proven to be unreliable.

    Mail the drive to:

          Joseph Picone
          Temple University
          Room 703A
          1947 North 12th Street
          Philadelphia, PA 19122
          Tel: 215-204-4841

    Please email us for details before shipping the drive. If you ship us a drive directly from a reseller such as Amazon, please make sure that the shipment contains information that we can use to identify you. This information should include a point of contact (POC), the name of your institution, and contact information (name, surface mail address and telephone number for the POC).

  4. If none of these work for you, a fourth alternative is wget. This is a popular web scraping tool that is available under both Windows, Linux and Mac. However, it has some drawbacks, including the fact that it doesn't follow links as well as rsync, and it produces a lot of extraneous files.

    You can use wget to download our entire corpora directory, but you cannot use it to download individual corporate because it will not follow links properly.

    A typical wget command would look something like this (all one line):

          wget -cr --no-parent --http-user="username" --http-passwd="password"
                https://www.isip.piconepress.com/projects/tuh_eeg/downloads/

    The fields "username" and "password" should be replaced with the actual values you receive from us in email. Note that this will download about 2T Bytes of data, so it will take a long time.

If you are having trouble deciding what to do, email us and describe what specific resources in which you are interested. We will be happy to guide you through the process.




What's New

  • 2021:

    • (20210214) NEDC Annotator (v4.0.3): This version fully supports Python 3.7.x and integrates numerous bug fixes.

    • (20210107) TUH EEG Artifacts (v2.0.0): A new, significantly expanded, version of the artifact corpus in which the entire signal is annotated.

  • 2020:

    • (20201220) NEDC PyPrint Edf (v1.0.0): A simplified version or our software to read header and signal data from an EDF file.

    • (20200925) NEDC Annotator (v4.0.0): This version fully supports Python 3 and integrates numerous bug fixes. We expect to include XML and csv file format support in the next release of this tool.

    • (20200821) NEDC Eval EEG (v4.0.0): This version integrates the competition version of the scoring software with our regular distribution. The NIST software, which is used to implement the ATWV metric, is now optional.

    • (20200725) NEDC TUH EEG Artifact Corpus (v2.0.0): This release includes annotated artifact events in 310 EEG files. They have not yet been reviewed by our senior annotators, but we have released it in hopes of receiving feedback on the data.

    • (20200528) NEDC Eval EEG (v3.3.3): A new version of our software that checks for duplicate hypotheses and checks for overlap between hypotheses.

    • (20200527) NEDC TUH EEG Seizure (v1.5.2): We have released v1.5.2 of the TU Seizure Detection (TUSZ) Corpus. This version include new annotations for the entire training database.

    • (20200408) Annotation Standards: Our paper describing our for the Temple University Hospital EEG Seizure Corpus has been published and is now available.

    • (20200403) NEDC Eval EEG (v3.3.2): In this version of our scoring software, the hypothesis confidence and additional fields are optional.

    • (20200331) NEDC Python Streaming Software (v1.0.1): A new version that is compatible with Python v3. It is functionally the same as v1.0.0.

    • (20200328) NEDC Eval EEG (v3.3.1): A new version of our software that uses a simplified file format. This version was developed to support the Neureka™ 2020 Epilepsy Challenge. The software reads a list of seizure events to compare and score them off the reference annotations of our recent database release: TUH EEG Seizure Corpus (v1.5.1).

    • (20200320) NEDC TUH EEG Seizure (v1.5.1): We have released v1.5.1 of the TU Seizure Detection (TUSZ) Corpus. We have manually reviewed the annotations for the dev and eval sets in preparation for the Neureka™ 2020 Epilepsy Challenge.

  • 2019:

    • (20190323) NEDC TUH EEG Seizure (v1.5.0): This release includes the expansion of the training dataset from 1,984 files to 4,597. Calibration sequences of the new data have been manually annotated and added to the seizure spreadsheet. Annotation corrections were made to the files already existing in the training set.

    • (20190308) IBM TUSZ Pre-Processed Data (v1.0.0): This is our first release of IBMPPD which preprocesses the TUH Seizure Detection Corpus using two methods, both of which use an FFT sliding window approach (STFT) in the beginning.

  • 2018:

    • (20181206) NEDC TUH EEG Artifact Corpus (v1.0.0): This is our first release of the TUH EEG Artifact Corpus. This corpus was developed to aid in EEG event classification such as seizure detection algorithms. This corpus is a subset of the TUH EEG Corpus and contains files with 5 different types of artifacts: (1) eye movements (EYEM), (2) chewing (CHEW), (3) shivering (SHIV), (4) electrode pop, electrode static, and lead artifacts (ELPP), and (5) muscle artifacts (MUSC).

    • (20181102) NEDC Eval EEG (v1.3.0): In this release, the FPR definition of the TAES metric has been updated to the standard definition which is #FP / (#FP + #TN) or in other words (1 - TNR).

    • (20181022) NEDC TUH EEG Seizure (v1.4.0): This release includes improvements to the quality of annotations. Annotation corrections were made in the development test and training sets

    • (20180817) NEDC TUH EEG Seizure (v1.3.0): This release contains quality improvements of the annotations, as manually labeled calibration sequences. The main reason for this release is that we have created a blind evaluation set, often referred to as a held-out set. This is not being released, but will be used in an upcoming Kaggle-style challenge hosted by IBM. More details about this challenge will follow within the next few months.

    • (20180710) NEDC EEG AutoAnnotations (v1.1.0): This release includes the addition of automatically generated annotations using a six-way classification approach. In six-way classification, the first three events are of clinical interest: (1) spike and/or sharp waves (SPSW), (2) periodic lateralized epileptiform discharges (PLED), and (3) generalized periodic epileptiform discharges (GPED). The remaining three events are used to model background noise: (1) eye movement (EYEM), (2) artifacts (ARTF), and (3) background (BCKG).

    • (20180621) TUH EEG Events Corpus (v1.0.1): The EDF files in the previous release with corrupted headers have been fixed. All files should pass processing using standard open source tools.

    • (20180609) TUH EEG Corpus (v1.1.0): These annotations are only for EEG files that are under an hour in duration and contain 22 channels.

    • (20180501) NEDC Eval EEG (v1.2.0): This scoring software uses five different scoring methods: NIST, DP Align, Epoch Based, Any-Overlap, and TAES.

    • (20180417) TUH EEG Corpus (v1.1.0): This is the official version of the TUH EEG Corpus. This release organizes each session by the montage definition it fits. Filename conventions have been adjusted and files not containing any usable brain signal have been removed.

    • (20180416) TUH EEG Seizure Corpus (v1.2.1): This release includes small annotation fixes and an addition to the _SEIZURES spreadsheet of calibration start and end times, as well as other small cleanups.

    • (20180412) TUH Abnormal EEG Corpus (v2.0.0): This is a subset of the TUH EEG corpus that can be used for automatic detection of abnormal EEGs. This release contains patient numbers that have been re-mapped to be consistent with v1.1.0 of TUH EEG. Some cleanups including the removal of duplicate file s have been made.

    • (20180412) TUH EEG Slowing Corpus (v1.0.1): This is a small data set that can be used to study the difference between slowing and seizures in EEGs. This release contains patient numbers that have been re-mapped to be consistent with v1.1.0 of TUH EEG.

    • (20180327) TUH EEG Six-Way Event Classification Corpus (v1.0.0): This release contains the data used to develop our initial version of AutoEEG. Sections of EEG signals are annotated for one of 6 events: spike, gped, pled, eye movement, artifact and background.

  • 2017:

    • (20171207) TUH EEG Seizure Corpus (v1.2.0): This release contains patient numbers that match v1.0.0 of TUH EEG. Numerous other cleanups have been made to the data.

    • (20170923) TUH EEG Corpus (v1.0.0): This is the official version of the TUH EEG Corpus. This release contains sessions recorded between 2002 and 2015. There are 13,500 patients and 23,218 sessions with paired EEG reports.

    • (20170920) TUH EEG Slowing Corpus (v1.0.0): A small data set that can be used to study the difference between seizures and slowing in EEGs.

    • (20170913) TUH Abnormal EEG Corpus (v1.1.2): This release fixes a small bug in that the age information was not correct for 38 files. That has been corrected based on age information in the EEG reports.

    • (20170816) TUH Abnormal EEG Corpus (v1.1.1): This release fixes a small bug. One of the files was Left Ear (LE) references. All files are now Average Reference (AR) format for the EEG signals.

    • (20170805) TUH EEG Seizure Corpus (v1.1.1): In this release, each seizure event is classified by type. The signal data and start/stop times of the events hasn't changed. But now we provide event-based and term-based annotations as well as the type of seizure.

    • (20170701) TUH EEG Seizure Corpus (v1.1.0): This release contains the expanded training set, sub-one second resolution on the seizure boundaries, and an expanded classification of each EEG session in terms of types and subtypes. Corrupted EDF headers have also been corrected.

    • (20170617) TUH EEG Seizure Corpus (v1.0.4): A new release that contains bug fixes and much more information about the data. Each EEG session is classified by type and duration (e.g., routine or LTM).

    • (20170314) TUH Abnormal EEG Corpus (v1.0.1): This is a subset of the TUH EEG Corpus that can be used for automatic detection of abnormal EEGs. This is a bug fix release. The corpus now contains only one file per session. Also, we provide a suggested partitioning of the data into evaluation and training data.

    • (20170426) TUH EEG Seizure Corpus (v1.0.3): This is the first official release of a subset of the TUH EEG Corpus that has been manually annotated for seizure events.

    • (20170314) TUH Abnormal EEG Corpus (v1.0.0): This is a subset of the TUH EEG Corpus that can be used for automatic detection of abnormal EEGs.

  • 2016:

  • 2015:

    • (20150101) TUH EEG Corpus (v0.6.0): A beta release used to collect feedback from the community.

    • (20150401) TUH EEG Corpus (v0.2.0): This is our first public release of the TUH EEG Corpus. This is a beta release intended to allow users to give us feedback on the data. There are 247 sessions, 615 EDF files, and over 150 hours of EEG data. The uncompressed data occupies about 8.3G of disk space.