Public release of Haitian Creole language data by Carnegie Mellon
The Language Technologies Institute (LTI) of Carnegie Mellon
University's School of Computer Science (CMU SCS) is making publicly
available the Haitian Creole spoken and text data that we have
collected or produced. We are providing this data with minimal
restrictions (see
license) in order to
allow others to develop language technology
for Haiti, in parallel with our own efforts to help with this crisis.
Since organizing the data in a useful fashion is not instantaneous,
and more text data is currently being produced by collaborators, we
will be publishing the data incrementally on the web, as it becomes
available.
Orthography
Note that several spelling systems exist for Haitian Creole.
We use here the official Haitian orthography
for Haitian Creole, by the IPN (Institut
Pedagogique National), 1979.
Haitian Creole data
Data License
Haitian Creole Speech data
Update:
Directory reorganized, ASR models added, noon EST on 28 January 2010.
Update:
Additional speech data added (data2), 12:45pm EST on 2 February 2010.
Update:
Added detailed description of speech data collection methodology on 24
March 2010.
Speech data originally collected by the U.S. DARPA-funded DIPLOMAT project
Haitian Creole Text data
Update:
There is an important update to this directory as of 1 p.m. EST on
27 January 2010. Please re-visit if you have used this data.
Various text data, including:
- Medical domain phrases and sentences collected at Carnegie
Mellon under the U.S. NSF-funded (jointly with the E.U.)
NESPOLE! project, and
translated into Haitian Creole by Eriksen Translations Inc.
- Parallel text data created by the U.S. DARPA-funded DIPLOMAT project
Acknowledgements
In addition to the members of the projects cited above:
Jeff Allen, SAP (formerly of Carnegie Mellon)
Vigdis Eriksen, Eriksen Translations Inc.
Manuel Stoeckl, Eriksen Translations Inc.
Karen Wallace
and these current Carnegie Mellon members:
Vamshi Ambati
Gopala Krishna Anumanchipalli
Alan W Black
Ralf Brown
Jaime Carbonell
Robert Frederking
Greg Hanneman
Sanjika Hewavitharana
David Huggins-Daines
Alon Lavie
Stephan Vogel
Contact for this page: Robert Frederking