首页 > 解决方案 > 如何快速将大量单词添加到 FuzzySet 中?

问题描述

我有一个大约 500 万个单词的语料库,我想放入一个模糊集。目前,大约需要 5 分钟。有没有更快的方法来做到这一点?

这是我的代码:

    import fuzzyset   
    fuzzy_set = fuzzyset.FuzzySet() 
    for word in list_of_words: # len(list_of_words)=~5M
       fuzzy_set.add(word)

我知道 for 循环不是在 Python 中做事的最快方法,但找不到任何文档来将列表添加到 FuzzySet。

谢谢您的帮助。

标签: pythonlistitertoolsfuzzy-search

解决方案


You could use multi processing. Split your list_of_words up into chunks and run it in a pool.

import fuzzyset 
import multiprocessing as mp
fuzzy_set = fuzzyset.FuzzySet() 

def add_words(chunk):
        
    for word chunk:
       fuzzy_set.add(word)

if __name__ == '__main__':
    
    n = 500000 # or whatever size you want your chunks split up into
    
    chunks = [list_of_words[x:x + n ] for x in range(0, len(list_of_words), n )]

    pool = mp.Pool(mp.cpu_count())
    pool.map(add_words, chunks)

推荐阅读