首页 > 解决方案 > 如何同时下载多个文件并为每个完成触发特定操作?

问题描述

我需要帮助我尝试实现的功能,不幸的是我对多线程不太满意。

我的脚本从 Internet 下载 4 个不同的文件,并为每个文件调用一个专用函数,然后全部保存。问题是我是一步一步做的,因此我必须等待每次下载完成才能继续下一个。

我知道我应该做些什么来解决这个问题,但我没有成功编写代码。

实际行为:

url_list = [Url1, Url2, Url3, Url4]
files_list = []

files_list.append(downloadFile(Url1))
handleFile(files_list[-1], type=0)
...
files_list.append(downloadFile(Url4))
handleFile(files_list[-1], type=3)
saveAll(files_list)

需要的行为:

url_list = [Url1, Url2, Url3, Url4]
files_list = []

for url in url_list:
    callThread(files_list.append(downloadFile(url)),             # function
               handleFile(files_list[url.index], type=url.index) # trigger
    #use a thread for downloading
    #once file is downloaded, it triggers his associated function
#wait for all files to be treated
saveAll(files_list)

谢谢你的帮助 !

标签: pythonmultithreadingpython-2.7

解决方案


典型的做法是将IO繁重的部分,比如通过互联网获取数据和数据处理放到同一个函数中:

import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor

import requests


def fetch_and_process_file(url):
    thread_name = threading.currentThread().name

    print(thread_name, "fetch", url)
    data = requests.get(url).text

    # "process" result
    time.sleep(random.random() / 4)  # simulate work
    print(thread_name, "process data from", url)

    result = len(data) ** 2
    return result


threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]

executor = ThreadPoolExecutor(max_workers=threads)
with executor:
    results = executor.map(fetch_and_process_file, urls)

print()
print("results:", list(results))

输出:

ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org

推荐阅读