首页 > 解决方案 > 多进程比较多个 .txt 文件中的字符串?

问题描述

我有几个 txt 文件,每个文件大约有一百万行,搜索等式大约需要一分钟。文件保存为 0.txt、1.txt、2.txt、... 为方便起见,in_1 和 searchType 是用户给定的输入。

class ResearchManager():
def __init__(self,searchType,in_1,file):
    self.file = file
    self.searchType = searchType
    self.in_1 = in_1
    
def Search(self):
    
    current_db = open(str(self.file) + ".txt",'r')
    .
    .
    .

    #Current file processing


if __name__ == '__main__':

n_file = 35
for number in range(n_file):
    RM = ResearchManager(input_n, input_1, number)
    RM.Search()

我想使用多处理优化搜索过程,但我没有成功。有没有办法做到这一点?谢谢你。

编辑。

我能够以这种方式使用线程:

class ResearchManager(threading.Thread):
def __init__(self, searchType, in_1, file):
    threading.Thread.__init__(self)
    self.file = file
    self.searchType = searchType
    self.in_1 = in_1
    
def run(self):
current_db = open(str(self.file) + ".txt",'r')
.
.
.

#Current file processing

...

        threads=[]
        for number in range(n_file+1):
            
            threads.append(ResearchManager(input_n,input_1,number))

        start=time.time()
        
        for t in threads:
            t.start()
            
        for t in threads:
            t.join()
        end=time.time()

但是总的执行时间甚至比正常的for循环还要长几秒。

标签: pythonperformancefor-loopmultiprocessing

解决方案


你能展示你在线程方面的尝试吗?看看这篇文章,它很好地提供了对 python 线程如何工作的基本理解。

https://realpython.com/intro-to-python-threading/

import logging
import threading
import time

def thread_function(name):
    logging.info("Thread %s: starting", name)
    time.sleep(2)
    logging.info("Thread %s: finishing", name)

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO,
                        datefmt="%H:%M:%S")

    threads = list()
    for index in range(3):
        logging.info("Main    : create and start thread %d.", index)
        x = threading.Thread(target=thread_function, args=(index,))
        threads.append(x)
        x.start()

    for index, thread in enumerate(threads):
        logging.info("Main    : before joining thread %d.", index)
        thread.join()
        logging.info("Main    : thread %d done", index)

推荐阅读