LOGIOS
Lexicon Generation Tool

An example

If your input file looks something like this left-hand column:

Your output file will look something like this right-hand column:

Hello

HELLO        HH EH L OW
HELLO(1)  HH AH L OW

world
compound_word
hyphen-ated
ONE23
2008
boom!
kweezlebotter

WORLD	W ER L D
COMPOUND_WORD	K AA M P AW N D W ER D
HYPHEN-ATED	HH AY F AH N EY T IH D
ONE23	OW EH N IY T UW TH R IY
2008	T UW Z IY R OW Z IY R OW EY T
BOOM!	B UW M
KWEEZLEBOTTER	K W IY Z L AH B AA T AH R

Please note the following:

Some words may have multiple pronunciations; these will appear on separate line and will be differentiated by an instance id such as "(1)". The current implementation of the Sphinx decoder expects each dictionary entry to be unique. Note however that this tool does not check for uniqueness, so if you include multiple instances of an input word it will appear multiple times. As a rule you want to sort your input files before you submit them.
Words with internal separators such as "_" and "-" will be rendered as a single word; the internal characters will be kept as part of the orthographic element.
Alpha-numeric items, as well as numbers, will be rendered character-by-character. This is because such items are ambiguous and can be rendered several ways (e.g., "one two three", "one twenty-three", etc.) It is you responsibility to determine how such items will be spoken. Typically this will vary by domain.
Punctuation marks will be ignored
Words that do not exist in the tool's dictionary will be generated according to letter-to-sound rules. There is no guarantee that such a pronunciation will be correct. You are advised to check these before use.
If you choose to manually alter pronunciations, be sure that you follow the formatting; and be sure that the phones are part of the legal set.

word file:
hand file:		Hand-crafted pronunciations that override sub-optimal (e.g. incorrect) pronunciations.

LOGIOS Lexicon Generation Tool

An example

LOGIOS
Lexicon Generation Tool