python - 使用 BeautifulSoup 在 DIV 标签下刮取 IMG SRC
问题描述
我正在尝试获取位于 Div 标签下的图像的 src。我的代码给了我一个错误,KeyError: 'src'
这是我的代码:
for page in range(1,4):
# code that gets dynamic URL
url = sys.argv[1] + "{}".format(page)
print(url)
html=urlopen(url)
soup=BeautifulSoup(html,"lxml")
for article in soup.find_all('article',class_='o-hit'):
div=soup.find('div',{"class":"o-rating_thumb@m-"})
img_src = div.find('img').attrs['src']
#img_src = article.find('div',class_ ='o-rating_thumb c-white').img['src']
headline = article.h2.text.strip()
summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text.strip()
#img_src = "none"
print(headline)
print(summary)
print(img_src)
csv_writer.writerow([headline,summary,img_src])
网页在这里: EndGadget 博客第 10 页
解决方案
对于每个页面上最顶部的新闻项目,您可以从 'src' 属性本身获取图像源。
您可以首先使用find()方法导航到包含图像的 div 。接下来在该 div 中,您可以找到img
标签并从其属性中获取其来源。
import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
div=soup.find('div',{"class":"o-rating_thumb@m-"})
print(div.find('img').attrs['src'])
输出:
https://o.aolcdn.com/images/dims?resize=810%2C455&crop=810%2C455%2C0%2C0&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1400%252C933%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1066%26image_uri%3Dhttp%253A%252F%252Fo.aolcdn.com%252Fhss%252Fstorage%252Fmidas%252F85a4e2b124ba329ab520e80e306f07eb%252F206517051%252FIMG_5243e.jpg%26client%3Da1acac3e1b3290917d92%26signature%3Dcea6158d0bf02768d31ee67f2694be6cafaf200c&client=amp-blogside-v2&signature=08a97a1109f1c3287c6766fa284104c6f78770fe
编辑以抓取页面的所有新闻来源:
即使第一张图片有一个属性src
,为了抓取后续图片,我们必须使用该属性data-originals
(您可以查看页面源并找出这一点)。我认为这就是您收到 AttributeError 的原因
我能够像这样抓取所有新闻项目
import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
articles=soup.find_all('article',{"class":"o-hit"})
for article in articles:
print("Heading: ", article.find('h2').text.strip())#heading
print("Summary: ", article.find('p').text.strip())#summary
print("Image Source:", article.find('img').attrs['data-original'])#image src
print()
输出:
Heading: Netflix will remove user reviews from its website next month
Summary: Last year five-star ratings got the ax, and now written reviews will fade away too.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F884e68f9a829f3a26db5b729f00ccd03%2F206508290%2FEnglish.jpg&client=amp-blogside-v2&signature=b37eb21e95cef8cebe1f3c741b8bb29eb3471dcc
Heading: Smart ForTwo Electric Drive quick spin review
Summary: The saddest way to spend $25,000.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2Fedbdfdfeff2e77567cd0c4a73484d108%2F206502307%2Fsmartfortwo.jpg&client=amp-blogside-v2&signature=a9fc05d80d4b4d8ba6ef33453510c138632bab81
Heading: Vivo's all-screen NEX S is a frustrating glimpse of the future
Summary: Spoiler alert: It's really cool, but don't bother importing one.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F29%2F5b36ac0e523dc352bd46785a%2F5b36aedc884c2354eb33d663_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=725c8033196a2ae3500e2144830d14b03e7abc0e
Heading: Sonos Beam review: Smart features trump minor audio compromises
Summary: Bringing the soundbar into the smart home era.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F27%2F5b32f579523dc352bd3f66f3%2F5b32fbf2884c2354eb33d62f_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=4ad311aeb5cb23907fd99ec12d962b148646163d
Heading: BlackBerry KEY2 review: The undisputed keyboard king
Summary: This is the best Android-powered BlackBerry, if that means anything to you.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F26%2F5b3188ee523dc36212a7ff02%2F5b318be5802b94347b7e586b_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=5438cdf814480be5856d38db73695f86ade186ea
Heading: Amazon Echo Look review: Good selfie taker, so-so stylist
Summary: An AI is no match for my style instincts.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F25%2F5b30cbfce880db6107cb7ad0%2F5b30cde61aa5fc22c7bbf187_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=308e9f00afcb968da05823ce0d0718ccc6e43cb4
Heading: Mitsubishi’s Outlander Plug-In Hybrid is an understated surprise
Summary: Mitsubishi is back, even though it actually never left.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bc80f523dc36212a2be79%2F5b2bc8a6884c2319c410c008_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=a00b8466fa281051de4d64b1223fe99f97315985
Heading: Amazon Fire TV Cube review: Alexa still needs work as a TV guide
Summary: This device was bound to be made at some point, but is it worth it?
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bb81edbaab36faf00ed0e%2F5b2bddfb884c2319c410c00c_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=baa2db64e12d013ab712d823238fc3efeee693f8
Heading: HTC U12+ review: Fundamentally flawed
Summary: The phone's pressure-sensitive power and volume keys are kinda the worst.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b28cd94f50775726418990a%2F5b2bd7d4b46ab33c496c1607_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=8518ce5c141fb85b935794fbd3bd283d32508484
推荐阅读
- python - 如何在 django 中插入多条记录?
- jenkins - 詹金斯,vmvare centos6.0
- wpf - 在 WPF 中单击菜单项时如何创建子窗口(C#)
- cron - 石英 Cron 失火
- javascript - 如何重置输入值
- javascript - 按标签搜索 Twitter Api
- python - 在 Linux 的虚拟环境中使用 python 安装包(Bash to Python)
- assembly - 在汇编代码中复制 C 程序
- google-tag-manager - 为什么在常规 GA 事件上使用 GTM/datalayer?
- java - 如何在java中使用标签?