首页 > 解决方案 > 用 Python 和 BeautifulSoup 抓取网页,发现错误

问题描述

r = requests.get('https://www.google.com/search?sxsrf=ALeKk001JpX8YqG_te4nMARfL4zgr0fsWQ%3A1590551416511&ei=eOPNXv7hHsnLrQH486_wBw&q=nzd+in+nis&oq=nzd+in+&gs_lcp=CgZwc3ktYWIQAxgAMgQIIxAnMgQIABBDMgQIABBDMgIIADIECAAQQzIECAAQQzICCAAyAggAMgIIADICCAA6BQgAEJECOggIABCDARCRAjoHCAAQgwEQQzoHCAAQFBCHAjoFCAAQgwE6CQgjECcQRhCCAlDrHVinNWCqRWgAcAB4AIABgQKIAZMLkgEFMC40LjOYAQCgAQGqAQdnd3Mtd2l6&sclient=psy-ab')

soup = BeautifulSoup(r.text, 'xml')


soup.findAll('div', {'class': 'dDoNo vk_bk gsrt gzfeS'})

试图从谷歌抓取信息,但收到以下错误:“AttributeError:ResultSet 对象没有属性'find'。您可能将元素列表视为单个元素。当您打算调用时是否调用了 find_all()寻找()?” 但是当尝试做汤时。发现根本没有任何结果。感谢你的帮助

标签: python-3.xweb-scraping

解决方案


你需要改变soup = BeautifulSoup(r.text, 'xml')。使用lxml而不是xml. 你可以试试看:

headers =  {'User-Agent': 'Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0 '}

url = "https://www.google.com/search?sxsrf=ALeKk001JpX8YqG_te4nMARfL4zgr0fsWQ%3A1590551416511&ei=eOPNXv7hHsnLrQH486_wBw&q=nzd+in+nis&oq=nzd+in+&gs_lcp=CgZwc3ktYWIQAxgAMgQIIxAnMgQIABBDMgQIABBDMgIIADIECAAQQzIECAAQQzICCAAyAggAMgIIADICCAA6BQgAEJECOggIABCDARCRAjoHCAAQgwEQQzoHCAAQFBCHAjoFCAAQgwE6CQgjECcQRhCCAlDrHVinNWCqRWgAcAB4AIABgQKIAZMLkgEFMC40LjOYAQCgAQGqAQdnd3Mtd2l6&sclient=psy-ab"

r = requests.get(url, headers=headers)

soup = BeautifulSoup(r.text, 'lxml')

allDiv = soup.findAll('div', {'class': 'dDoNo vk_bk gsrt gzfeS'})

print(allDiv)

推荐阅读