python - 使用 Python 的单个连接上的多个请求
问题描述
当我使用单个请求下载文件时,我执行以下操作:
session = requests.Session()
params = {'fd': 1, 'count': 1024, 'auth': 'auth_token'}
r = session.get('https://httpbin.org/bytes/9', params=params)
print(r.content)
# b'\xb3_\\l\xe2\xbf/:\x07'
如何在不等待答案的情况下执行多个请求?
服务器 api 文档说:
您可以通过单个连接推送多个请求而无需等待应答,以提高性能。服务器将按照收到请求的顺序处理请求,并保证您以相同的顺序收到答复。然而,使用“Connection: keep-alive”发送所有请求很重要,否则 API 服务器将关闭连接而不处理待处理的请求。
他们正在谈论一个线程和多个请求,而无需等待答案。我想它被称为HTTP pipelining
.
如何使用 Python Requests 库做到这一点?
一个类似的答案建议使用并行调用,而我的问题并非如此。它还说:“requests
池连接,保持 TCP 连接打开”。我该如何实施?
如果不可能,我可以使用任何其他同步库requests
吗?
解决方案
您可以在没有线程的情况下并行获取多个页面。它通过重置状态(私有变量!)来利用 HTTP 流水线HTTPSConnection
来欺骗它提前发送下一个请求。
from http.client import HTTPSConnection, _CS_IDLE
from urllib.parse import urlparse, urlunparse
def pipeline(host, pages, max_out_bound=4, debuglevel=0):
page_count = len(pages)
conn = HTTPSConnection(host)
conn.set_debuglevel(debuglevel)
responses = [None] * page_count
finished = [False] * page_count
content = [None] * page_count
headers = {'Host': host, 'Content-Length': 0, 'Connection': 'Keep-Alive'}
while not all(finished):
# Send
out_bound = 0
for i, page in enumerate(pages):
if out_bound >= max_out_bound:
break
elif page and not finished[i] and responses[i] is None:
if debuglevel > 0:
print('Sending request for %r...' % (page,))
conn._HTTPConnection__state = _CS_IDLE # private variable!
conn.request("GET", page, None, headers)
responses[i] = conn.response_class(conn.sock, method=conn._method)
out_bound += 1
# Try to read a response
for i, resp in enumerate(responses):
if resp is None:
continue
if debuglevel > 0:
print('Retrieving %r...' % (pages[i],))
out_bound -= 1
skip_read = False
resp.begin()
if debuglevel > 0:
print(' %d %s' % (resp.status, resp.reason))
if 200 <= resp.status < 300:
# Ok
content[i] = resp.read()
cookie = resp.getheader('Set-Cookie')
if cookie is not None:
headers['Cookie'] = cookie
skip_read = True
finished[i] = True
responses[i] = None
elif 300 <= resp.status < 400:
# Redirect
loc = resp.getheader('Location')
responses[i] = None
parsed = loc and urlparse(loc)
if not parsed:
# Missing or empty location header
content[i] = (resp.status, resp.reason)
finished[i] = True
elif parsed.netloc != '' and parsed.netloc != host:
# Redirect to another host
content[i] = (resp.status, resp.reason, loc)
finished[i] = True
else:
path = urlunparse(parsed._replace(scheme='', netloc='', fragment=''))
if debuglevel > 0:
print(' Updated %r to %r' % (pages[i], path))
pages[i] = path
elif resp.status >= 400:
# Failed
content[i] = (resp.status, resp.reason)
finished[i] = True
responses[i] = None
if resp.will_close:
# Connection (will be) closed, need to resend
conn.close()
if debuglevel > 0:
print(' Connection closed')
for j, f in enumerate(finished):
if not f and responses[j] is not None:
if debuglevel > 0:
print(' Discarding out-bound request for %r' % (pages[j],))
responses[j] = None
break
elif not skip_read:
resp.read() # read any data
if any(not f and responses[j] is None for j, f in enumerate(finished)):
# Send another pending request
break
else:
break # All responses are None?
return content
if __name__ == '__main__':
domain = 'en.wikipedia.org'
pages = ['/wiki/HTTP_pipelining', '/wiki/HTTP', '/wiki/HTTP_persistent_connection']
data = pipeline(domain, pages, max_out_bound=3, debuglevel=1)
for i, page in enumerate(data):
print()
print('==== Page %r ====' % (pages[i],))
print(page[:512])
推荐阅读
- android - Android finish() 结束应用程序,而不是转到上一个活动的 onAcitivityResult
- django - Django 信号:不能使用 For 循环?
- ruby-on-rails - 为具有 has_many 关联的模型同步 Rails 数据库
- css - 是使用 Live Sass 编译器(VS Code 扩展)还是通过 npm 安装和运行 Sass 更好?(+ 提示如何从 node-sass 更改为 dart-sass)
- ruby-on-rails - Puma 编译时不支持 SSL (RuntimeError) Windows
- typescript - TS2339:打字稿中 @type/react-table 的类型上不存在属性“单元格”
- python - 测试 FastAPI OAuth2PasswordRequestForm 得到 400 状态码
- powerbi - 是否应该将简单的计算(例如百分比增加)作为数据或计算包含在 PowerBI 模型中?
- mysql - 在另一个单元格中输入 SQL 表 ID 后,如何使用 Excel 与 MySQL 交互并在一个单元格中显示 SQL 查询的结果?
- typescript - 打字稿没有从模块的 index.d.ts 中找到类型