python - 逐字处理文本时,多处理比顺序处理慢
问题描述
我需要逐字处理文本。由于我编写的顺序程序非常慢,我尝试使用多处理库对其进行编码。我发现多处理软件比顺序软件慢得多。使用 Pool 函数时我是否遗漏了代码中的某些内容?do_something 函数执行许多 fors 和 ifs。
顺序代码:
class Text():
def do_something(self, word):
....
# Computational heavy code
....
return new_word
....
new_text = []
for sentence in text:
new_sentence = []
for word in sentence:
....
new_word = Text().do_something(word)
new_sentence += new_word
new_text.append(new_sentence)
print(new_text)
多进程代码:
class Text():
def do_something(self, word):
....
# Computational heavy code
....
return new_word
def do_word(self, word):
....
if len(word) > 2:
return self.do_something(word).split('$')
else:
return ['NONE']
def do_text(self, text):
new_text = []
pool = Pool(processes = cpu_count())
for sentence in text:
new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
return new_text
if __name__ == "__main__":
....
print(Text().text(file))
编辑
正如 Panagiotis Kanavos 所建议的那样,我尝试实现多线程而不是多处理。但是,运行下面的代码,机器似乎只使用一个核心(cpu 使用率约为 25%,而我有一个 4 核 cpu)。速度似乎与使用顺序代码(也有 25% 的 cpu 使用率)获得的速度相同。
from multiprocessing.dummy import Pool as ThreadPool
class Text():
def do_something(self, word):
....
# Computational heavy code
....
return new_word
def do_word(self, word):
....
if len(word) > 2:
return self.do_something(word).split('$')
else:
return ['NONE']
def do_text(self, text):
new_text = []
pool = ThreadPool(processes = cpu_count())
for sentence in text:
new_text.append( [item for sublist in pool.map(self.do_word, sentence.split()) for item in sublist if item != 'NONE'] )
return new_text
if __name__ == "__main__":
....
print(Text().text(file))