python - 仅拉取 div 内的特定 ul
问题描述
我有以下html文本-
<div class="a-fixed-left-grid-col a-col-left" id="zg-left-col" style="width:200px;margin-left:-200px;float:none;">
<ul id="zg_browseRoot">
<li class="zg_browseUp"> ‹
<a href="https://www.amazon.com/Best-Sellers/zgbs">Any Department</a>
</li>
<ul>
<li class="zg_browseUp"> ‹
<a href="https://www.amazon.com/Best-Sellers/zgbs/amazon-devices">Amazon Devices & Accessories</a>
</li>
<ul>
<li>
<span class="zg_selected"> Amazon Devices</span>
</li>
<ul>
<li><a href="https://www.amazon.com/Best-Sellers-Home-Security-Amazon/zgbs/amazon-devices/17386948011">Home Security from Amazon</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Amazon-Echo-Alexa-Devices/zgbs/amazon-devices/9818047011">Amazon Echo & Alexa Devices</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Dash-Buttons/zgbs/amazon-devices/10667898011">Dash Buttons</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Fire-TV/zgbs/amazon-devices/8521791011">Fire TV</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Fire-Tablets/zgbs/amazon-devices/6669703011">Fire Tablets</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Kindle-readers/zgbs/amazon-devices/6669702011">Kindle E-readers</a></li>
<li><a href="https://www.amazon.com/Best-Sellers-Amazon-Device-Bundles/zgbs/amazon-devices/16926003011">Device Bundles</a></li>
</ul>
</ul>
</ul>
</ul>
</div>
我想像这样拉 -
https://www.amazon.com/Best-Sellers-Home-Security-Amazon/zgbs/amazon-devices/17386948011
https://www.amazon.com/Best-Sellers-Amazon-Echo-Alexa-Devices/zgbs/amazon-devices/9818047011
https://www.amazon.com/Best-Sellers-Dash-Buttons/zgbs/amazon-devices/10667898011
https://www.amazon.com/Best-Sellers-Fire-TV/zgbs/amazon-devices/8521791011
https://www.amazon.com/Best-Sellers-Fire-Tablets/zgbs/amazon-devices/6669703011
https://www.amazon.com/Best-Sellers-Kindle-readers/zgbs/amazon-devices/6669702011
https://www.amazon.com/Best-Sellers-Amazon-Device-Bundles/zgbs/amazon-devices/16926003011
我尝试使用下面的代码及其工作,但没有给出我想要的结果。
soup.find('div', class_= 'a-fixed-left-grid-col a-col-left').find_all('ul')[3]
解决方案
你需要得到所有href
里面的所有anchor
标签。尝试这个:
print([a['href'] for a in soup.find('div', class_= 'a-fixed-left-grid-col a-col-left').find_all('ul')[3].find_all('a')])
推荐阅读
- python - 熊猫选择值小于 90% 列的行
- python - TensorFlow 读取和解码 BATCH 图像
- javascript - 排序数字javascript数组
- javascript - 无法使用 javascript 和 ajax 提交表单
- spring-boot - Kubernetes和spring boot变量解析冲突
- linux - shell脚本不将命令输出存储为变量
- python - 如何在 Python 中包含逗号“,”作为元组条目的一部分
- ionic-framework - Ionic 3 App 启动器图标通知不起作用
- python - Pandas Dataframe Pivot 和重新索引 n 个时间序列
- python-3.x - Caffe2提取fc7特征