python - 如何在 CMU 发音词典中查找某个 arpabet 发音是否存在？

问题描述

给定一个编码随机话语的 arpabet 令牌列表，我希望能够查看 arpabet 字符串是否实际上在 CMU 发音字典中。如果是，我还想要它在字典中匹配的单词。

有什么办法可以在 python 中做到这一点？

标签： python

不确定您是想在 Python 中使用 CMU Sphinx 还是自己来做这件事，但无论如何我都想出了一个合理的解决方案。这是代码：

#!/usr/bin/env python3
with open('dic.dict') as f:
    cmu_dict = {}
    for entry in f:
        tokens = []
        for t in entry.split():
            tokens.append(t)
        cmu_dict[tokens[0]] = tokens[1:] # index dict by word, value are phonemes

my_arpabets_list = [
    ['P', 'AH', 'L', 'IY', 'S', 'M', 'IH', 'N'],
    ['D', 'IH', 'L', 'AH', 'T', 'EY', 'SH', 'AH', 'N'],
]

for arpabet_tokens in my_arpabets_list:
    found = False
    for word, pronunciation in cmu_dict.items():
        if pronunciation == arpabet_tokens:
            print('match: %s %s' % (word, ' '.join(pronunciation)))
            found = True
            break
    if not found:
        print('error: could not find a word for tokens %s' % arpabet_tokens)

通过执行代码，我可以获得以下输出：

$ ./read.py 
match: policemen P AH L IY S M IH N
error: could not find a word for tokens ['D', 'IH', 'L', 'AH', 'T', 'EY', 'SH', 'AH', 'N']

假设您安装了 Pocketsphinx，您可以将文件替换dic.dict为与软件包一起安装的英语语言的默认字典以/usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict对其进行测试。

python - 如何在 CMU 发音词典中查找某个 arpabet 发音是否存在？

问题描述

解决方案

推荐阅读