python - 如何使用 Python 访问 RSS 提要中的图像和图像 url?
问题描述
我目前使用 feedparser 在 Python 中有这段代码:
import feedparser
RSS_FEEDS = {'cnn': 'http://rss.cnn.com/rss/edition.rss'}
def get_news_test(publication="cnn"):
feed = feedparser.parse(RSS_FEEDS[publication])
articles_cnn = feed['entries']
for article in articles_cnn:
print(article)
get_news_test()
上面的代码返回所有当前的文章。这是它返回的其中一篇文章的示例:
{'title': "China's internet shutdowns tactics are spreading worldwide", 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'http://rss.cnn.com/rss/edition.rss', 'value': "China's internet shutdowns tactics are spreading worldwide"}, 'summary': 'When Hong Kong police fired tear gas at peaceful pro-democracy protesters in 2014, the news moved swiftly through social media. Photos and videos of mostly student demonstrators being gassed helped fuel the outrage that ultimately drove hundreds of thousands of people into the streets.', 'summary_detail': {'type': 'text/html', 'language': None, 'base': 'http://rss.cnn.com/rss/edition.rss', 'value': 'When Hong Kong police fired tear gas at peaceful pro-democracy protesters in 2014, the news moved swiftly through social media. Photos and videos of mostly student demonstrators being gassed helped fuel the outrage that ultimately drove hundreds of thousands of people into the streets.'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.cnn.com/2019/01/17/africa/internet-shutdown-zimbabwe-censorship-intl/index.html'}], 'link': 'https://www.cnn.com/2019/01/17/africa/internet-shutdown-zimbabwe-censorship-intl/index.html', 'id': 'https://www.cnn.com/2019/01/17/africa/internet-shutdown-zimbabwe-censorship-intl/index.html', 'guidislink': False, 'published': 'Fri, 18 Jan 2019 07:40:48 GMT', 'published_parsed': time.struct_time(tm_year=2019, tm_mon=1, tm_mday=18, tm_hour=7, tm_min=40, tm_sec=48, tm_wday=4, tm_yday=18, tm_isdst=0), 'media_content': [{'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-super-169.jpg', 'height': '619', 'width': '1100'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-large-11.jpg', 'height': '300', 'width': '300'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-vertical-large-gallery.jpg', 'height': '552', 'width': '414'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-video-synd-2.jpg', 'height': '480', 'width': '640'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-live-video.jpg', 'height': '324', 'width': '576'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-t1-main.jpg', 'height': '250', 'width': '250'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-vertical-gallery.jpg', 'height': '360', 'width': '270'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-story-body.jpg', 'height': '169', 'width': '300'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-t1-main.jpg', 'height': '250', 'width': '250'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-assign.jpg', 'height': '186', 'width': '248'}, {'medium': 'image', 'url': 'https://cdn.cnn.com/cnnnext/dam/assets/190116165508-zimbabwe-protest-0115-01-hp-video.jpg', 'height': '144', 'width': '256'}]}
现在我知道我可以通过调用返回其中的某些部分,例如标题:
print(article.title)
但是,我对如何从提要中获取图像数据感到困惑。
解决方案
每个文章条目都有一个资产列表media_content
。每个资产节点都包含媒体类型(我只看到过'image'
)、大小、url 等。
要简单地列出每个资产的媒体类型和 url,您可以使用以下内容:
import feedparser
feed = feedparser.parse("http://rss.cnn.com/rss/edition.rss")
for article in feed["entries"]:
for media in article.media_content:
print(f"medium: {media['medium']}")
print(f" url: {media['url']}")
输出:
medium: image
url: https://cdn.cnn.com/cnnnext/dam/assets/190107112254-01-game-of-thrones-spain-castle-of-zafra-t1-main.jpg
medium: image
url: https://cdn.cnn.com/cnnnext/dam/assets/190107112254-01-game-of-thrones-spain-castle-of-zafra-assign.jpg
medium: image
url: https://cdn.cnn.com/cnnnext/dam/assets/190107112254-01-game-of-thrones-spain-castle-of-zafra-hp-video.jpg
...
如果你想请求和保存类型的资产'image'
,你可以使用requests
:
import feedparser
import os
import requests
feed = feedparser.parse("http://rss.cnn.com/rss/edition.rss")
for article in feed["entries"]:
for media in article.media_content:
if media["medium"] == "image":
img_data = requests.get(media["url"]).content
with open(os.path.basename(media["url"]), "wb") as handler:
handler.write(img_data)
推荐阅读
- python - Python 脚本不会使用 PHP 运行
- reactjs - 如何在反应对象上创建新属性?错误:无法分配给只读属性
- asp.net - 如果 __VIEWSTATE 未加密,如何正确解码?
- javascript - 使用 Google Apps 脚本根据列的条件使行在 Google 表格中不可编辑
- r - 覆盖ggplot中渐变色阶的自动分配值
- go - 如何使用beego生成离线swagger doc
- wpf - 如何将焦点设置到内部 TextBox 但闪烁的插入符号可见?
- extjs - CEF - 使用 Chromium Embedded Framework 时某些字体图标不可见
- python - 生成最后一个数组并附加到列表中的次数python
- php - 如何在另一个文件的jquery中调用嵌套函数?