首页 > 解决方案 > BeautifulSoup RSS 提要提取一个选项卡重排“1”

问题描述

使用python3,BeautifulSoup,试图获取RSS提要,在 <description>标签里面有<a><img>标签。

我只想得到

  1. <a>标签href
  2. <img>标签源
import requests
from bs4 import BeautifulSoup
from bs4 import CData

tp_api = "https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms"
response = requests.get(tp_api)
soup = BeautifulSoup(response.text, 'xml')
results = soup.find_all('item',)
records = []
for result in results:
    main = result.find('description').string
    images = main
    print(main)

我们得到的回应

<a href="https://timesofindia.indiatimes.com/india/maharashtra-congress-demands-complete-loan-waiver-for-flood-hit-farmers/articleshow/70675961.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/70675961.cms" /></a>The Congress on Wednesday sought a complete loan waiver for farmers affected by floods in Maharashtra and demanded that the state government provide them an assistance of Rs 60,000 per hectare of crop damage.

标签: pythonweb-scrapingbeautifulsoup

解决方案


import requests
from bs4 import BeautifulSoup
from bs4 import CData

tp_api = "https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms"
response = requests.get(tp_api)
soup = BeautifulSoup(response.text, 'html.parser')
results = soup.find_all('item',)
records = []
for result in results:
    main = BeautifulSoup(result.find('description').string, 'html.parser')
    a_tag = main.find('a')
    images = a_tag


print(a_tag)

输出:

<a href="https://timesofindia.indiatimes.com/india/delhi-hc-stays-jnu-inquiry-against-teachers-for-participating-in-protest/articleshow/70676842.cms"><img align="left" border="0" hspace="10" src="https://timesofindia.indiatimes.com/photo/70676842.cms" style="margin-top:3px;margin-right:5px;"/></a>

推荐阅读