首页 > 解决方案 > 如何创建一个拼写检查器来读取正确拼写的 txt 文件,并根据他们的 txt 单词文件向用户建议更正选项

问题描述

我目前正在尝试制作一个拼写检查器,它会读取我的 .txt 文件和 users.txt 文件并建议可能正确的单词。这就是我到目前为止所拥有的。

def main():
    while True:
        try:
            mistake=input("file to check: ")
            a=open(mistake,'r')
            break
        except:
            print('this file does not exist')
    while True:
        try:
            spell=input("file with known words[enter for default]: ")
            if spell=='':
                spell='default_words.txt'
                b=open(spell,'r')
            else:
                b=open(spell,'r')
            break
        except :
            print('this file does not exist')      
    while True:
        try:
            open('common.txt','r')
            break
        except:
            print('you do not have common.txt on your device') 
               
    print("---------------------------")

我想使用 Levenshtein 距离来完成这个问题,但我不确定从哪里开始。我知道我需要通过说明来定义函数

def simplified_lev(a :str, b : str):

simplified_lev(a,b) = max( len(a*), len(b*) )

以及一个辅助函数:

def ab_star(a : str, b:str)

其中 a* 是删除 a 和 b 的所有匹配的前导和尾随字符时字符串 a 的剩余部分,b* 是执行相同操作时字符串 b 的剩余部分

1

标签: pythonpython-3.x

解决方案


这是我的建议:

from typing import Tuple, Iterable, Any


def longest_common_prefix(s1: Iterable[Any], s2: Iterable[Any]) -> int:
    index = -1  # in case of both empty strings
    for index, (c1, c2) in enumerate(zip(s1, s2)):
        if c1 != c2:
            return index
    return index + 1


def ab_star(a: str, b: str) -> Tuple[str, str]:
    prefix_length = longest_common_prefix(a, b)
    suffix_length = longest_common_prefix(reversed(a[prefix_length:]), reversed(b[prefix_length:]))
    return a[prefix_length:len(a) - suffix_length], b[prefix_length:len(b) - suffix_length]


def simplified_lev(a: str, b: str) -> int:
    a_star, b_star = ab_star(a, b)
    return max(len(a_star), len(b_star))

这是基于您提供的输入的测试,它通过了:

from dataclasses import dataclass

@dataclass
class TestData:
    a: str
    b: str
    a_star: str
    b_star: str
    simplified_lev: int


test_data = (
    TestData("abc",      "abc",       "",       "",      0),
    TestData("abc",      "vwxyz",     "abc",    "vwxyz", 5),
    TestData("abcxyz",   "abqq",      "cxyz",   "qq",    4),
    TestData("abc23xyz", "aWz",       "bc23xy", "W",     6),
    TestData("abcONE",   "abc",       "ONE",    "",      3),
    TestData("abcONE",   "ANE",       "abcO",   "A",     4),
    TestData("abXyz",    "abXCATSwz", "y",      "CATSw", 5),
)


def test__ab_star() -> None:
    for data in test_data:
        assert ab_star(data.a, data.b) == (data.a_star, data.b_star)


def test__simplified_lev() -> None:
    for data in test_data:
        assert simplified_lev(data.a, data.b) == data.simplified_lev


if __name__ == "__main__":
    test__ab_star()
    test__simplified_lev()

推荐阅读