首页 > 解决方案 > 收集 Trie 节点下所有完整单词的后缀(在 Python 中使用递归)

问题描述

我需要添加列出后缀的功能来实现我们的自动完成功能。为此,我在 TrieNode 对象上实现了一个函数,该函数将返回 trie 中存在的所有完整单词后缀。例如,如果我们的 Trie 包含单词 ["fun", "function", "factory"] 并且我们从f节点请求后缀,我们希望["un", "unction", "actory"]node.get_suffixes(). 这是我开始的方式:

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self):
        pass

我已经get_suffixes单独测试了该功能,它似乎工作正常。

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

我如何测试该功能:

# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

get_suffixes(node.children["A"]) # Returns ['t', 'baca', 'dd', 'dmin'], as expected

当我尝试将get_suffixes函数移动到TrieNode类时出现问题。在这里我不知道应该如何处理全局变量result。它不再应该是全局变量。我试过两个版本:

版本 I:创建result一个类属性

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 

node.children["A"].get_suffixes() # Returns ['t'], which is wrong

版本二:做result一个默认的函数参数

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def suffixes(self, suffix="", result=[]):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    result.append(suffix)
                self.children[key].suffixes(suffix)
                suffix = suffix[:-1]   
        return result

node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin']
node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin', 't', 'baca', 'dd', 'dmin']

版本 II 的结果并不令人惊讶,因为:

def append(number, number_list=[]):
    number_list.append(number)
    print(number_list)
    return number_list

append(5) # expecting: [5], actual: [5]
append(7) # expecting: [7], actual: [5, 7]
append(2) # expecting: [2], actual: [5, 7, 2]

我正在学习 Python 中的算法和数据结构。我被要求使用递归函数来做到这一点。其他方法,例如实现 Trie 以支持 Python中的自动完成功能并不是我期望的答案,尽管它们本身可能能够解决问题。我非常好奇为什么self.result在版本 I 中没有正确修改,但如果它不驻留在类中则可以正常工作。

标签: pythonpython-3.xalgorithm

解决方案


result属于类TrieNode

当您self.result从该get_suffixes方法返回时,您只包括在当前TrieNode实例中找到的答案。

您还需要包括其子项找到的答案。多亏了递归,代码只需要稍作更改,添加即可self.result+=self.children[key].get_suffixes(suffix)使一切正常。

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                else:
                    self.result+=self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 



# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True


print(node.children["A"].get_suffixes())

输出:-

['t', 'baca', 'dd', 'dmin']

要记住的是,每个孩子都是TrieNode该类的一个新实例,因此有自己独立的result数组。

修改插入 + 无结果数组:-

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, string):
        if len(string) == 0:
            self.word_end = True
            return
        ## Add a child node in this Trie
        if not string[0] in self.children:
            self.children[string[0]] = TrieNode()
        self.children[string[0]].insert(string[1:])

    def get_suffixes(self, suffix=""):
        query_result=[]
        if self.word_end:
            query_result.append(suffix)
        for i in self.children:
            query_result+=self.children[i].get_suffixes(suffix+i)
        return query_result




# Create a mock trie for the test
node = TrieNode()
node.insert("Add")
node.insert("At")
node.insert("Abaca")
node.insert("Admin")

print(node.children["A"].get_suffixes())
print(node.children["A"].get_suffixes())
print(node.children["A"].children["t"].get_suffixes())

输出:-

['dd', 'dmin', 't', 'baca']
['dd', 'dmin', 't', 'baca']
['']
[Finished in 0.0s]

推荐阅读