首页 > 解决方案 > 为什么非线程程序执行速度比线程程序在 python 中下载数据集的速度快

问题描述

我想下载印度股票市场公司的数据集,所以我写了下面的代码来下载它,但是时间太长了,因为我想下载的公司数量在 1700 左右。

首先我以常规方式编写它而不使用线程,如下所示,

import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread

start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)

t1 = time.perf_counter()


df = web.DataReader("RELIANCE.NS", 'yahoo', start,end)
df = web.DataReader("TCS.NS", 'yahoo', start,end)
df = web.DataReader("HINDUNILVR.NS", 'yahoo', start,end)
df = web.DataReader("HDFCBANK.NS", 'yahoo', start,end)
df = web.DataReader("HDFC.NS", 'yahoo', start,end)
df = web.DataReader("INFY.NS", 'yahoo', start,end)
df = web.DataReader("KOTAKBANK.NS", 'yahoo', start,end)
df = web.DataReader("BHARTIARTL.NS", 'yahoo', start,end)
df = web.DataReader("ITC.NS", 'yahoo', start,end)
df = web.DataReader("ICICIBANK.NS", 'yahoo', start,end)
df = web.DataReader("SBIN.NS", 'yahoo', start,end)
df = web.DataReader("ASIANPAINT.NS", 'yahoo', start,end)
df = web.DataReader("DMART.NS", 'yahoo', start,end)
df = web.DataReader("BAJFINANCE.NS", 'yahoo', start,end)
df = web.DataReader("MARUTI.NS", 'yahoo', start,end)
df = web.DataReader("HCLTECH.NS", 'yahoo', start,end)
df = web.DataReader("LT.NS", 'yahoo', start,end)
df = web.DataReader("WIPRO.NS", 'yahoo', start,end)
df = web.DataReader("AXISBANK.NS", 'yahoo', start,end)
df = web.DataReader( "ULTRACEMCO.NS", 'yahoo', start,end)
df = web.DataReader("HDFCLIFE.NS", 'yahoo', start,end)
df = web.DataReader("COALINDIA.NS", 'yahoo', start,end)
df = web.DataReader("ONGC.NS", 'yahoo', start,end)
df = web.DataReader("SUNPHARMA.NS", 'yahoo', start,end)
df = web.DataReader("NTPC.NS", 'yahoo', start,end)


t2 = time.perf_counter()

print(f'在 {t2-t1} 秒内完成')

输出

Finished in 27.4473087 seconds

然后我在 youtube 上看到了一些关于线程的视频,我转换了如下相同的程序,

import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread

start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)

t1 = time.perf_counter()


shareSymbols = [
   "RELIANCE.NS", "TCS.NS", "HINDUNILVR.NS", "HDFCBANK.NS", "HDFC.NS", "INFY.NS","KOTAKBANK.NS","BHARTIARTL.NS", "ITC.NS", "ICICIBANK.NS", "SBIN.NS", "ASIANPAINT.NS","DMART.NS", "BAJFINANCE.NS", "MARUTI.NS", "HCLTECH.NS","LT.NS", "WIPRO.NS", "AXISBANK.NS", "ULTRACEMCO.NS", "HDFCLIFE.NS" ,"COALINDIA.NS", "ONGC.NS", "SUNPHARMA.NS", "NTPC.NS"
]
def download_data(shareSymbol):
    df = web.DataReader(shareSymbols, 'yahoo', start,end)


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_data, shareSymbols)    

    t2 = time.perf_counter()

    print(f'Finished in {t2-t1} seconds')

和上述代码的输出

Finished in 83.4883162 seconds

为什么第一个程序比第二个程序花费更少的时间?我需要做任何改变吗?

标签: pythonmultithreadingpython-multithreading

解决方案


在 Python 中有一种叫做 GIL(全局解释器锁)的东西:https ://en.wikipedia.org/wiki/Global_interpreter_lock 。您可能希望为此任务考虑多处理。

concurrent.futures 包有

class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=())为了这。


推荐阅读