To request access to the TUH EEG Corpus, please fill out
You will receive an automatically-generated username and password
via email. Data collected is unencumbered and can be used for both
research and commercialization purposes.
Due to the size of the data, the best way to transmit this data is via a hard disk. If you elect this option, you need to send us a 2T USB drive and provide a UPS or Fedex account number for return shipping. Mail the drive to:
1947 North 12th Street
Philadelphia, PA 19122
for details before shipping the drive. If you ship us a drive
directly from a reseller such as Amazon, please make sure
that the shipment contains information that we can use to
identify you. This information should include a point of
contact (POC), the name of your institution, and contact
information (name, surface mail address and
telephone number for the POC).
Below are the data, software and other miscellaneous resources currently available from our site:
(20181206) NEDC TUH EEG Artifact Corpus (v1.0.0): This is our first release of the TUH EEG Artifact Corpus. This corpus was developed to aid in EEG event classification such as seizure detection algorithms. This corpus is a subset of the TUH EEG Corpus and contains files with 5 different types of artifacts: (1) eye movements (EYEM), (2) chewing (CHEW), (3) shivering (SHIV), (4) electrode pop, electrode static, and lead artifacts (ELPP), and (5) muscle artifacts (MUSC).
(20181102) NEDC Eval EEG (v1.3.0): In this release, the FPR definition of the TAES metric has been updated to the standard definition which is #FP / (#FP + #TN) or in other words (1 - TNR).
(20181022) NEDC TUH EEG Seizure (v1.4.0): This release includes improvements to the quality of annotations. Annotation corrections were made in the development test and training sets
(20180817) NEDC TUH EEG Seizure (v1.3.0): This release contains quality improvements of the annotations, as manually labeled calibration sequences. The main reason for this release is that we have created a blind evaluation set, often referred to as a held-out set. This is not being released, but will be used in an upcoming Kaggle-style challenge hosted by IBM. More details about this challenge will follow within the next few months.
- (20180710) NEDC EEG AutoAnnotations (v1.1.0): This release includes the addition of automatically generated annotations using a six-way classification approach. In six-way classification, the first three events are of clinical interest: (1) spike and/or sharp waves (SPSW), (2) periodic lateralized epileptiform discharges (PLED), and (3) generalized periodic epileptiform discharges (GPED). The remaining three events are used to model background noise: (1) eye movement (EYEM), (2) artifacts (ARTF), and (3) background (BCKG).
- (20180621) TUH EEG Events Corpus (v1.0.1): The EDF files in the previous release with corrupted headers have been fixed. All files should pass processing using standard open source tools.
- (20180609) TUH EEG Corpus (v1.1.0): These annotations are only for EEG files that are under an hour in duration and contain 22 channels.
- (20180501) NEDC Eval EEG (v1.2.0): This scoring software results using 5 different scoring methods: NIST, DP Align, Epoch Based, Any-Overlap, and TAES.
- (20180417) TUH EEG Corpus (v1.1.0): This is the official version of the TUH EEG Corpus. This release organizes each session by the montage definition it fits. File naming conventions have been adjusted and files not containing any usable brain signal have been removed.
- (20180416) TUH EEG Seizure Corpus (v1.2.1): This release includes small annotation fixes and an addition to the _SEIZURES spreadsheet of calibration start and end times, as well as other small cleanups.
- (20180412) TUH Abnormal EEG Corpus (v2.0.0): This is a subset of the TUH EEG corpus that can be used for automatic detection of abnormal EEGs. This release contains patient numbers that have been re-mapped to be consistent with v1.1.0 of TUH EEG. Some cleanups including the removal of duplicate files have been made.
- (20180412) TUH EEG Slowing Corpus (v1.0.1): This is a small data set that can be used to study the difference between slowing and seizures in EEGs. This release contains patient numbers that have been re-mapped to be consistent with v1.1.0 of TUH EEG.
- (20180327) TUH EEG Six-Way Event Classification Corpus (v1.0.0): This release contains the data used to develop our initial version of AutoEEG. Sections of EEG signals are annotated for one of 6 events: spike, gped, pled, eye movement, artifact and background.
- (20171207) TUH EEG Seizure Corpus (v1.2.0): This release contains patient numbers that match v1.0.0 of TUH EEG. Numerous other cleanups have been made to the data.
- (20170923) TUH EEG Corpus (v1.0.0): This is the official version of the TUH EEG Corpus. This release contains sessions recorded between 2002 and 2015. There are 13,500 patients and 23,218 sessions with paired EEG reports.
- (20170920) TUH EEG Slowing Corpus (v1.0.0): A small data set that can be used to study the difference between seizures and slowing in EEGs.
- (20170913) TUH Abnormal EEG Corpus (v1.1.2): This release fixes a small bug in that the age information was not correct for 38 files. That has been corrected based on age information in the EEG reports.
- (20170816) TUH Abnormal EEG Corpus (v1.1.1): This release fixes a small bug. One of the files was Left Ear (LE) references. All files are now Average Reference (AR) format for the EEG signals.
- (20170805) TUH EEG Seizure Corpus (v1.1.1): In this release, each seizure event is classified by type. The signal data and start/stop times of the events hasn't changed. But now we provide event-based and term-based annotations as well as the type of seizure.
- (20170701) TUH EEG Seizure Corpus (v1.1.0): This release contains the expanded training set, sub-one second resolution on the seizure boundaries, and an expanded classification of each EEG session in terms of types and subtypes. Corrupted EDF headers have also been corrected.
- (20170617) TUH EEG Seizure Corpus (v1.0.4): A new release that contains bug fixes and much more information about the data. Each EEG session is classified by type and duration (e.g., routine or LTM).
- (20170314) TUH Abnormal EEG Corpus (v1.0.1): This is a subset of the TUH EEG Corpus that can be used for automatic detection of abnormal EEGs. This is a bug fix release. The corpus now contains only one file per session. Also, we provide a suggested partitioning of the data into evaluation and training data.
- (20170426) TUH EEG Seizure Corpus (v1.0.3): This is the first official release of a subset of the TUH EEG Corpus that has been manually annotated for seizure events.
- (20170314) TUH Abnormal EEG Corpus (v1.0.0): This is a subset of the TUH EEG Corpus that can be used for automatic detection of abnormal EEGs.
- (20160301) TUH EEG Epilepsy Corpus (v0.0.1): This is a subset of the TUH EEG Corpus that contains 100 subjects with and without epilepsy.
- (20150101) TUH EEG Corpus (v0.6.0): A beta release used to collect feedback from the community.
- (20150401) TUH EEG Corpus (v0.2.0): This is our first public release of the TUH EEG Corpus. This is a beta release intended to allow users to give us feedback on the data. There are 247 sessions, 615 EDF files, and over 150 hours of EEG data. The uncompressed data occupies about 8.3G of disk space.
- (20180206) NEDC Demo (v0.4.0): A beta release of our visualization tool. This is described in more detail in this paper.
- (20170711) NEDC Eval EEG (v1.0.0): The first release of our scoring software. This is described in more detail in this paper.
- (20170711) PyEDFLib: A Python interface to EDFLib that lets you read and write EDF files (the distribution format for TUH EEG).
- (20170711) edfRead: MATLAB code to read EDF files.
- (20160401) Display Annotations: A simple Python script to view label files.
- (20160401) Print Header: This distribution contains two simple programs, nedc_print_header and nedc_print_signal, that can be used to view EDF file information. For example, print_header can be used with grep to view specific information about a field in the header across all files in the database.
- (20160401) Filename Conventions: Comments are welcome on our proposal for the organization of release v1.0.0 of the TUH EEG Corpus.
- (20150401) EDF Header: A byte-by-byte description of an EDF header.
The TUH EEG Corpus is freely available. The only reason we require
registration is that we need to track who downloads the data. We also
want to be able to inform you of any updates to the releases.
Once you have obtained the username and password, you can selectively download portions of the corpus using your browser.
You might also be interested in using a command like wget to download the corpus. A typical wget command would look something like this:
wget -r --no-parent --http-user="username" --http-passwd="password" https://www.isip.piconepress.com/projects/tuh_eeg/downloads/tuh_eeg/v0.6.0/
The fields "username" and "password" should be replaced with the actual values you receive from us in email.