首页 > 解决方案 > 逐字处理文本时,多处理比顺序处理慢

问题描述

我需要逐字处理文本。由于我编写的顺序程序非常慢,我尝试使用多处理库对其进行编码。我发现多处理软件比顺序软件慢得多。使用 Pool 函数时我是否遗漏了代码中的某些内容?do_something 函数执行许多 fors 和 ifs。

顺序代码:

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word
....
new_text = []
for sentence in text:
    new_sentence = []
    for word in sentence:
        ....
        new_word = Text().do_something(word)
        new_sentence += new_word
    new_text.append(new_sentence)
print(new_text)

多进程代码:

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word

    def do_word(self, word):
        ....
        if len(word) > 2:
            return self.do_something(word).split('$')
        else:
            return ['NONE']

    def do_text(self, text):
        new_text = []
        pool = Pool(processes = cpu_count())   

        for sentence in text:
            new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
        return new_text

if __name__ == "__main__":
    ....
    print(Text().text(file))

编辑

正如 Panagiotis Kanavos 所建议的那样,我尝试实现多线程而不是多处理。但是,运行下面的代码,机器似乎只使用一个核心(cpu 使用率约为 25%,而我有一个 4 核 cpu)。速度似乎与使用顺序代码(也有 25% 的 cpu 使用率)获得的速度相同。

from multiprocessing.dummy import Pool as ThreadPool 

class Text():
    def do_something(self, word):
        ....
        # Computational heavy code
        ....
        return new_word

    def do_word(self, word):
        ....
        if len(word) > 2:
            return self.do_something(word).split('$')
        else:
            return ['NONE']

    def do_text(self, text):
        new_text = []
        pool = ThreadPool(processes = cpu_count())   

        for sentence in text:
            new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
        return new_text

if __name__ == "__main__":
    ....
    print(Text().text(file))

标签: pythonmultiprocessingpython-multiprocessingpython-multithreading

解决方案


推荐阅读