python - 使用 BeautifulSoup 将所有 href 刮到列表中
问题描述
我想从这个页面抓取链接并将它们放在一个列表中。
我有这个代码:
import bs4 as bs
import urllib.request
source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/236').read()
soup = bs.BeautifulSoup(source,'lxml')
links = soup.find_all('a', attrs={'class': 'view'})
print(links)
它产生以下输出:
[<a class="view" href="/en/catalog/view/514">
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
</a>,
"""There are 28 lines more"""
<a class="view" href="/en/catalog/view/565">
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
</a>]
我需要得到以下信息:[/en/catalog/view/514, ... , '/en/catalog/view/565']
但后来我继续添加以下内容:href_value = links.get('href')
我遇到了一个错误。
解决方案
尝试:
soup = bs.BeautifulSoup(source,'lxml')
links = [i.get("href") for i in soup.find_all('a', attrs={'class': 'view'})]
print(links)
输出:
['/en/catalog/view/514', '/en/catalog/view/515', '/en/catalog/view/179080', '/en/catalog/view/45518', '/en/catalog/view/521', '/en/catalog/view/111429', '/en/catalog/view/522', '/en/catalog/view/182223', '/en/catalog/view/168153', '/en/catalog/view/523', '/en/catalog/view/524', '/en/catalog/view/60228', '/en/catalog/view/525', '/en/catalog/view/539', '/en/catalog/view/540', '/en/catalog/view/31642', '/en/catalog/view/553', '/en/catalog/view/558', '/en/catalog/view/559', '/en/catalog/view/77672', '/en/catalog/view/560', '/en/catalog/view/55377', '/en/catalog/view/55379', '/en/catalog/view/32001', '/en/catalog/view/561', '/en/catalog/view/562', '/en/catalog/view/72185', '/en/catalog/view/563', '/en/catalog/view/564', '/en/catalog/view/565']
推荐阅读
- ios - 在 UIView 中创建内部阴影以复制 Neumorphic Style
- swiftui - 导航视图格式问题
- r - 以简单的方式导入、编辑和保存 JSON?
- node.js - 如何在 Node.js 中使用 Cloud Tasks 实现拉取队列
- rust - 在可变结构上使用 RwLock.read()
- flutter - 找不到“openssl”提供的包配置文件
- elasticsearch - 动作/元数据行 [1] 的 ElasticSearch 8 错误包含未知参数 [_type] 状态:400
- sql-server - Azure SQL Server BCP -
- plotly-dash - 如何从客户端回调返回 HTML / 组件?
- javascript - 尝试在 node.js 中使用 Formidable 上传多个图像文件