Each release of transcription data for this project will be a superset
of the previous release (in other words, you need only download the latest
release). All transcriptions and segmentations developed in this project
are based on the audio data from the following SWITCHBOARD release:
Switchboard-1 Telephone Speech Corpus: Release 2 August, 1997
For information regarding SWITCHBOARD, please consult the
LDC web site.
For more details about this project, see the
is also available to discuss progress and key issues of the project.
Transcriptions and Word Alignments:
Download manually corrected word alignments: Several lexicon
items were fixed in the 10/19/02 release, and about 45 start/stop
times that had negative durations (stop time preceded the start
time) were repaired. We are no longer actively developing this
resource, but continue to include bug fixes. Included in this
release are the final transcriptions for the entire database, the
complete lexicon, and automatic word alignments.
Download the ICSI Transcriptions:
This release differs from the 03/15/01/release only by one utterance.
Two utterances were merged to form one utterance, and the phone
transcriptions were corrected.
The original ICSI data is available from the
WS97 ftp site
at the Center for Language and Speech Processing (CSLP)
at Johns Hopkins University. It can also be downloaded from the
of this data.
Download the Penn Treebank Transcriptions:
This release contains a few bug fixes in the 10/19/02 release,
reflecting changes described above in the word alignments and
segmentations. This Penn Treebank release contains an alignment
of the ISIP hand-aligned word transcriptions to the Penn Treebank
word transcriptions for all 1126 SWB conversations that are
included in the Treebank. For the words which are in agreement
between the two transcriptions, time marks are given. For words
that do not agree, we estimate the times for the Treebank
transcriptions using the ISIP transcriptions. The transcriptions
also include all instances of silence, laughter and noise.
provide on-line feedback about key issues.
download a document describing our transcription conventions.
download a statistical analysis of the SWB corpus.
A copy of the SWB models file that we use.
an on-line educational resource for learning about the SWB
Quarterly reports summarizing the progress made on the project.
our transcription and segmentation tool.
this is a simple C program to correct Switchboard files that
have been corrupted by flipping of their bits.
download a public domain speech recognition
system under development in ISIP.
an overview of the SWITCHBOARD (SWB) resegmentation project.
Personnel: the people that make SWB resegmentation happen.
do you want to be a SWITCHBOARD validator?
- Timesheets: a
list of due dates for timesheets.
summer workshops on conversational speech recognition.
Please direct questions or comments to