首页 > 解决方案 > HTTP 响应代码错误地出现在实际为 200 的位置

问题描述

我正在尝试从 XML 中提取 HTTP 链接。然后尝试获取相同的http响应代码。但有趣的是,我得到的是 500 或 400。如果我点击 url,我将在浏览器中正确获取图像。

我的代码是:

def extract_src_link(path):
with open(path, 'r') as myfile:
    for line in myfile:
        if "src" in line:
            src_link = re.search('src=(.+?)ptype="2"', line)
            url = src_link.group(1)
            url = url[1:-1]
            #print ("url:", url)
            resp = requests.head(url)
            print(resp.status_code)

不知道这里发生了什么。这就是我的输出的样子

/usr/local/bin/python2.7 
/Users/rradhakrishnan/Projects/eVision/Scripts/xml_validator_ver3.py
Processing: 
/Users/rradhakrishnan/rradhakrishnan1/mobily/E30000001554522119_2020_01_27T17_35_40Z.xml
500
404
Processing: 
/Users/rradhakrishnan/rradhakrishnan1/mobily/E30000001557496079_2020_01_27T17_35_40Z.xml
500
404

这就是我的输出的样子:

标签: python-2.7

解决方案


我以某种方式设法破解它。添加用户代理确实解决了这个问题。

def extract_src_link(path):
    with open(path, 'r') as myfile:
        for line in myfile:
            if "src" in line:
                src_link = re.search('src=(.+?)ptype="2"', line)
                url = src_link.group(1)
                url = url[1:-1]
                print ("url:", url)
                # resp = requests.head(url)
                # print(resp.status_code)
                headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)     Chrome/37.0.2049.0 Safari/537.36'}

                r = requests.get('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphire', headers=headers)
                print r.status_code

推荐阅读