python - 为什么非线程程序执行速度比线程程序在 python 中下载数据集的速度快
问题描述
我想下载印度股票市场公司的数据集,所以我写了下面的代码来下载它,但是时间太长了,因为我想下载的公司数量在 1700 左右。
首先我以常规方式编写它而不使用线程,如下所示,
import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread
start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)
t1 = time.perf_counter()
df = web.DataReader("RELIANCE.NS", 'yahoo', start,end)
df = web.DataReader("TCS.NS", 'yahoo', start,end)
df = web.DataReader("HINDUNILVR.NS", 'yahoo', start,end)
df = web.DataReader("HDFCBANK.NS", 'yahoo', start,end)
df = web.DataReader("HDFC.NS", 'yahoo', start,end)
df = web.DataReader("INFY.NS", 'yahoo', start,end)
df = web.DataReader("KOTAKBANK.NS", 'yahoo', start,end)
df = web.DataReader("BHARTIARTL.NS", 'yahoo', start,end)
df = web.DataReader("ITC.NS", 'yahoo', start,end)
df = web.DataReader("ICICIBANK.NS", 'yahoo', start,end)
df = web.DataReader("SBIN.NS", 'yahoo', start,end)
df = web.DataReader("ASIANPAINT.NS", 'yahoo', start,end)
df = web.DataReader("DMART.NS", 'yahoo', start,end)
df = web.DataReader("BAJFINANCE.NS", 'yahoo', start,end)
df = web.DataReader("MARUTI.NS", 'yahoo', start,end)
df = web.DataReader("HCLTECH.NS", 'yahoo', start,end)
df = web.DataReader("LT.NS", 'yahoo', start,end)
df = web.DataReader("WIPRO.NS", 'yahoo', start,end)
df = web.DataReader("AXISBANK.NS", 'yahoo', start,end)
df = web.DataReader( "ULTRACEMCO.NS", 'yahoo', start,end)
df = web.DataReader("HDFCLIFE.NS", 'yahoo', start,end)
df = web.DataReader("COALINDIA.NS", 'yahoo', start,end)
df = web.DataReader("ONGC.NS", 'yahoo', start,end)
df = web.DataReader("SUNPHARMA.NS", 'yahoo', start,end)
df = web.DataReader("NTPC.NS", 'yahoo', start,end)
t2 = time.perf_counter()
print(f'在 {t2-t1} 秒内完成')
和输出,
Finished in 27.4473087 seconds
然后我在 youtube 上看到了一些关于线程的视频,我转换了如下相同的程序,
import pandas_datareader as web
import pandas as pd
import csv
import requests
import time
import concurrent.futures
import datetime
from threading import Thread
start = datetime.date.today() - datetime.timedelta(days=10)
end = yesterday = datetime.date.today() - datetime.timedelta(days=1)
t1 = time.perf_counter()
shareSymbols = [
"RELIANCE.NS", "TCS.NS", "HINDUNILVR.NS", "HDFCBANK.NS", "HDFC.NS", "INFY.NS","KOTAKBANK.NS","BHARTIARTL.NS", "ITC.NS", "ICICIBANK.NS", "SBIN.NS", "ASIANPAINT.NS","DMART.NS", "BAJFINANCE.NS", "MARUTI.NS", "HCLTECH.NS","LT.NS", "WIPRO.NS", "AXISBANK.NS", "ULTRACEMCO.NS", "HDFCLIFE.NS" ,"COALINDIA.NS", "ONGC.NS", "SUNPHARMA.NS", "NTPC.NS"
]
def download_data(shareSymbol):
df = web.DataReader(shareSymbols, 'yahoo', start,end)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(download_data, shareSymbols)
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
和上述代码的输出,
Finished in 83.4883162 seconds
为什么第一个程序比第二个程序花费更少的时间?我需要做任何改变吗?
解决方案
在 Python 中有一种叫做 GIL(全局解释器锁)的东西:https ://en.wikipedia.org/wiki/Global_interpreter_lock 。您可能希望为此任务考虑多处理。
concurrent.futures 包有
class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=())
为了这。
推荐阅读
- r - 显示数据不是面板时每个 ID 出现的行数 (R)
- vue.js - vue-router 路由中的递归 slug
- c# - 每个类方法的通用约束不同?
- azure - 运行手动触发的管道时,什么可能导致“InternalServerError 执行请求”?
- python - 如何从 datetime python DataFrame 按“分组”小时聚合数据?
- datetime - 我的查询适用于时间戳,但日期时间出错
- html - div元素的CSS背景不起作用
- laravel - 如何在 Policy 中允许 nova 资源操作
- python - Python Panda - 将两列值连接成带有标签名称的单列
- java - 事务隔离级别 -1 不支持