首页 > 解决方案 > 并行天蓝色 blob 上传获取警告“urllib3.connectionpool 警告 - 连接池已满,正在丢弃连接”

问题描述

由于我需要将超过100000的大量文件上传到azure blob存储中,所以我编写了一个程序通过这样的多线程处理上传。

from azure.storage.blob import BlobServiceClient, BlobClient
from itertools import repeat
from concurrent.futures import ThreadPoolExecutor
import os

def upload_single_blob(blob_service_client, blob_path):
    # Create a blob client using the local file name as the name for the blob
    blob_client = blob_service_client.get_blob_client(container='MyContainer', 
    blob=blob_path)

    # Upload the file
    with open(blob_path, "rb") as data:
        blob_client.upload_blob(data)

# make blob service client from connect str
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# make file path list to upload
blob_path_list = os.listdir("./blob_files/")
blob_path_list = map(lambda x: "./blob_files/"+x, blob_path_list)
blob_path_list = list(blob_path_list)

# multi threading upload to blob
with ThreadPoolExecutor(max_workers=100) as executor:
    executor.map(upload_single_blob, repeat(blob_service_client), blob_path_list)

但是,当我在 azure VM(操作系统是 ubuntu18.04)上运行这个程序时,我收到了很多警告。

urllib3.connectionpool WARNING --Connection pool is full, discarding connection: myblobaccount.blob.core.windows.net

我没有准确测量它,但似乎同时只有大约10个连接,即使以100个线程并行上传。

如何再增加连接数?

标签: pythonmultithreadingazureazure-blob-storageurllib3

解决方案


推荐阅读