首页 > 解决方案 > Levenshtein 函数查找最接近的名称

问题描述

我需要以下代码的帮助。在这种情况下,我需要找到与输入单词最接近的单词来测试我将 word_0 设置为“pikaru”,它应该返回“pikachu”。levenshtein 函数返回我们输入的两个单词之间的距离。当我运行下面的代码时,我得到的答案是charmander,这是很远的,任何帮助将不胜感激。

import backend
name_to_stats, id_to_name, names, 
        pokemon_by_typebackend.get_pokemon_stats()
words = names


word_0 = 'pikaru'
def find_closest_word(word_0, words):
    """Finds the closest word in the list to word_0 as measured by the
    Levenshtein distance

    Args:
        word_0: a str
        words: a list of str

    Returns:
        The closest word in words to word_0 as a str.
    """
    # Hint: use the levenshtein_distance() function to help you out here.
    closest_word = words[0]
    #closest_distance = levenshtein_distance(word_0, words[0])

    for i in words:
        distance = levenshtein_distance(word_0, closest_word)
        new_distance = levenshtein_distance(word_0, i)
        if distance < new_distance:
            return i





def levenshtein_distance(s1, s2):
    """Returns the Levenshtein distance between strs s1 and s2

    Args:
        s1: a str
        s2: a str
    """
    # This function has already been implemented for you.
    # Source of the implementation:
    # https://stackoverflow.com/questions/2460177/edit-distance-in-python
    # If you'd like to know more about this algorithm, you can study it in
    # CSCC73 Algorithms. It applies an advanced technique called dynamic
    # programming.
    # For more information:
    # https://en.wikipedia.org/wiki/Levenshtein_distance
    # https://en.wikipedia.org/wiki/Dynamic_programming
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1],
                                       distances_[-1])))
        distances = distances_
    return distances[-1]

标签: python

解决方案


看起来错误出现在您的函数return语句中:find_closest_word

if distance < new_distance:
    return i

该函数不会找到最近的单词,它实际上会找到列表中. words[0]相反,请尝试循环words并跟踪哪个单词是您迄今为止见过的最好的单词。就像是:

best_distance = levenshtein_distance(word_0, words[0])
best_word = words[0]
for w in words:
    d = levenshtein_distance(word_0, w)
    if d < best_distance:
        best_distance = d
        best_word = w

return best_word

推荐阅读