python - beautifulsoup4 不返回内容
问题描述
嗨,我关注并理解了这篇关于如何从网站读取内容的文章,并且效果很好: geeksforgeeks.org:Reading selected pages content using Python Web Scraping
但是当我更改我的代码以使用另一个站点时,它不会返回任何值。我正在尝试获取那些 Value1 和 Value2 等。如下所示。
请注意:从该网页阅读内容是合法的。
import requests
from bs4 import BeautifulSoup
# the target we want to open
url='https://hackerone.com/directory?offers_bounties=true&asset_type=URL&order_direction=DESC&order_field=started_accepting_at'
#open with GET method
resp=requests.get(url)
#http_respone 200 means OK status
if resp.status_code==200:
print("Successfully opened the web page")
print("The news are as follow :-\n")
# we need a parser,Python built-in HTML parser is enough .
soup=BeautifulSoup(resp.text,'html.parser')
# l is the list which contains all the text i.e news
l=soup.find("tr","spec-directory-entry daisy-table__row fade fade--show")
#now we want to print only the text part of the anchor.
#find all the elements of a, i.e anchor
for i in l:
print(i.text)
else:
print("Error")
以下是网站源代码:
<tr class="spec-directory-entry daisy-table__row fade fade--show">
<a href="/livestream" class="daisy-link spec-profile-name">Value1</a>
<tr class="spec-directory-entry daisy-table__row fade fade--show">
<a href="/livestream" class="daisy-link spec-profile-name">Value2</a>
<tr class="spec-directory-entry daisy-table__row fade fade--show">
.
.
.
解决方案
呈现网页内容所需的 JavaScript。使用 prerenderio 服务是一种从页面获取您正在寻找的数据的简单/轻松的方式。
import requests
from bs4 import BeautifulSoup
# the target we want to open
# changed to use prerenderio service
url='http://service.prerender.io/https://hackerone.com/directory?offers_bounties=true&asset_type=URL&order_direction=DESC&order_field=started_accepting_at'
#open with GET method
resp=requests.get(url)
#http_respone 200 means OK status
if resp.status_code==200:
print("Successfully opened the web page")
print("The news are as follow :-\n")
# we need a parser,Python built-in HTML parser is enough .
soup=BeautifulSoup(resp.text,'html.parser')
# l is the list which contains all the text i.e news
l=soup.find("tr","spec-directory-entry daisy-table__row fade fade--show")
#now we want to print only the text part of the anchor.
#find all the elements of a, i.e anchor
for i in l:
print(i.text)
else:
print("Error")
上述代码返回的数据:
Successfully opened the web page
The news are as follow :-
LivestreamManaged
04 / 2019
73
$100
$150-$250
编辑:回应艾哈迈德的评论
这是仅获取“Livestream”表行的值的代码。
import requests
from bs4 import BeautifulSoup
# the target we want to open
# changed to use prerenderio service
url='http://service.prerender.io/https://hackerone.com/directory?offers_bounties=true&asset_type=URL&order_direction=DESC&order_field=started_accepting_at'
#open with GET method
resp=requests.get(url)
#http_respone 200 means OK status
if resp.status_code==200:
print("Successfully opened the web page")
print("The news are as follow :-\n")
# we need a parser,Python built-in HTML parser is enough .
soup=BeautifulSoup(resp.text,'html.parser')
# l is the list which contains all "tr" tags
l=soup.findAll("tr","spec-directory-entry daisy-table__row fade fade--show")
# looping through the list of table rows
for i in l:
# checking if the current row is for 'Livestream'
if i.find('a').text == 'Livestream':
# printing the row's values except the first "td" tag
for e in i.findAll('td')[1:]:
print(e.text)
else:
print("Error")
结果:
Successfully opened the web page
The news are as follow :-
04 / 2019
73
$100
$150-$250
推荐阅读
- merge - SSIS将左侧2行的连接数据合并为右侧1行
- html - 如何在不使用绝对位置的情况下在 div 的底部和中心获取 img
- python - 如何删除不同元组中的相同元素
- c# - 最大值不适用于 Razor 中的 EditorFor 字段
- javascript - 无法在 Firestore 子集合中加载文档
- android - 如何在 Android 样式中更改字体类型、字体大小和粗体文本 (PopUpMenu)
- javascript - 带有 if 语句的 chart.js 标签
- angular - 如何在 Angular 表单中禁用按钮
- javascript - 默认情况下,必须在页面加载时选择 html 单选按钮
- clips - CLIPS 规则不匹配