python - 在python中接收HTTP响应中的正文内容
问题描述
我正在尝试从正文中获取内容,但是当我需要 sock.recv 时,我总是返回 0 字节。我已经得到了标题,它工作正常,但我一个字节一个字节地收到它。我现在的问题是:我的内容长度是标题的长度,还有标题。现在我想单独获取身体任务 3d
PS:我知道它不能像屏幕截图那样工作,但我还没有找到其他解决方案
# -*- coding: utf-8 -*-
"""
task3.simple_web_browser
XX-YYY-ZZZ
<Your name>
"""
from socket import gethostbyname, socket, timeout, AF_INET, SOCK_STREAM
from sys import argv
HTTP_HEADER_DELIMITER = b'\r\n\r\n'
CONTENT_LENGTH_FIELD = b'Content-Length:'
HTTP_PORT = 80
ONE_BYTE_LENGTH = 1
def create_http_request(host, path, method='GET'):
'''
Create a sequence of bytes representing an HTTP/1.1 request of the given method.
:param host: the string contains the hostname of the remote server
:param path: the string contains the path to the document to retrieve
:param method: the string contains the HTTP request method (e.g., 'GET', 'HEAD', etc...)
:return: a bytes object contains the HTTP request to send to the remote server
e.g.,) An HTTP/1.1 GET request to http://compass.unisg.ch/
host: compass.unisg.ch
path: /
return: b'GET / HTTP/1.1\nHost: compass.unisg.ch\r\n\r\n'
'''
### Task 3(a) ###
# Hint 1: see RFC7230-7231 for the HTTP/1.1 syntax and semantics specification
# https://tools.ietf.org/html/rfc7230
# https://tools.ietf.org/html/rfc7231
# Hint 2: use str.encode() to create an encoded version of the string as a bytes object
# https://docs.python.org/3/library/stdtypes.html#str.encode
r = '{} {} HTTP/1.1\nHost: {}\r\n\r\n'.format(method, path, host)
response = r.encode()
return response
### Task 3(a) END ###
def get_content_length(header):
'''
Get the integer value from the Content-Length HTTP header field if it
is found in the given sequence of bytes. Otherwise returns 0.
:param header: the bytes object contains the HTTP header
:return: an integer value of the Content-Length, 0 if not found
'''
### Task 3(c) ###
# Hint: use CONTENT_LENGTH_FIELD to find the value
# Note that the Content-Length field may not be always at the end of the header.
for line in header.split(b'\r\n'):
if CONTENT_LENGTH_FIELD in line:
return int(line[len(CONTENT_LENGTH_FIELD):])
return 0
### Task 3(c) END ###
def receive_body(sock, content_length):
'''
Receive the body content in the HTTP response
:param sock: the TCP socket connected to the remote server
:param content_length: the size of the content to recieve
:return: a bytes object contains the remaining content (body) in the HTTP response
'''
### Task 3(d) ###
body = bytes()
data = bytes()
while True:
data = sock.recv(content_length)
if len(data)<=0:
break
else:
body += data
return body
### Task 3(d) END ###
def receive_http_response_header(sock):
'''
Receive the HTTP response header from the TCP socket.
:param sock: the TCP socket connected to the remote server
:return: a bytes object that is the HTTP response header received
'''
### Task 3(b) ###
# Hint 1: use HTTP_HEADER_DELIMITER to determine the end of the HTTP header
# Hint 2: use sock.recv(ONE_BYTE_LENGTH) to receive the chunk byte-by-byte
header = bytes()
chunk = bytes()
try:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
except socket.timeout:
pass
return header
### Task 3(b) END ###
def main():
# Change the host and path below to test other web sites!
host = 'example.com'
path = '/index.html'
print(f"# Retrieve data from http://{host}{path}")
# Get the IP address of the host
ip_address = gethostbyname(host)
print(f"> Remote server {host} resolved as {ip_address}")
# Establish the TCP connection to the host
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((ip_address, HTTP_PORT))
print(f"> TCP Connection to {ip_address}:{HTTP_PORT} established")
# Uncomment this comment block after Task 3(a)
# Send an HTTP GET request
http_get_request = create_http_request(host, path)
print('\n# HTTP GET request ({} bytes)'.format(len(http_get_request)))
print(http_get_request)
sock.sendall(http_get_request)
# Comment block for Task 3(a) END
# Uncomment this comment block after Task 3(b)
# Receive the HTTP response header
header = receive_http_response_header(sock)
print(type(header))
print('\n# HTTP Response Header ({} bytes)'.format(len(header)))
print(header)
# Comment block for Task 3(b) END
# Uncomment this comment block after Task 3(c)
content_length = get_content_length(header)
print('\n# Content-Length')
print(f"{content_length} bytes")
# Comment block for Task 3(c) END
# Uncomment this comment block after Task 3(d)
body = receive_body(sock, content_length)
print('\n# Body ({} bytes)'.format(len(body)))
print(body)
# Comment block for Task 3(d) END
if __name__ == '__main__':
main()
解决方案
我有内容长度标题的长度以及标题
你没有。在receive_http_response_header
你HTTP_HEADER_DELIMITER
总是只检查最新的字节(chunk
而不是header
),这意味着你永远不会匹配标题的结尾:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
然后你只是假设你已经阅读了完整的标题,而实际上你已经阅读了完整的响应。这意味着recv
您在尝试读取响应正文时所做的另一个操作只会返回 0,因为那里没有更多数据,即正文已经包含在您认为的 HTTP 标头中。
除此之外receive_body
也是错误的,因为你犯了一个类似的错误receive_http_response_header
:目标不是recv
content_length
一次又一次地读取字节,直到没有更多可用的字节,但目标是在length(body)
匹配时返回content_length
并继续读取剩余的数据只要身体没有完全阅读。
推荐阅读
- c - 使用 char 指针读取 GCC 中的寄存器
- java - Java在main方法中打印递归
- firebase - 在没有获取完整文档数据的情况下检查 Firestore 上是否存在文档
- python-3.x - Python loop ascending values from list of dictionary
- python - 如何解决这个 Flask 应用程序 run.py 错误
- java - Java - 按属性对对象数组进行分组和排序
- python - 训练后更改模型的参数是否不正确?
- symfony - API 平台布尔过滤器默认值
- c# - 检查刚体是否接地?
- vhdl - 尝试使用 fpga 在 640x480 VGA 显示器上显示