python - 如何使用 beautifulsoup 获取 A-Tag 的属性值 - 基于条件:标题字符串包含关键字
问题描述
我从这里的网站有看起来像这样的 html 对象
<li id="V7551"><a href="/childcare/studies/36149/datasets/1/sdaxml/variable?var=V7551" name="V7551" target="new" title='Item number: 31680 F1
Not counting work for school or a job, about how many hours a week do you spend on the Internet e-mailing, instant
messaging, gaming, shopping, searching, downloading music, etc.?
1="None" 2="Less than 1 hour" 3="1-2 hours" 4="3-5 hours" 5="6-9 hours" 6="10-19 hours" 7="20-29 hours"
8="30-39 hours" 9="40 or more"'><span class="select-varname">V7551</span> 2014 C09 #HR/W INTERNET S F1</a></li>
<li id="V7553"><a href="/childcare/studies/36149/datasets/1/sdaxml/variable?var=V7553" name="V7553" target="new" title='Item number: 31990 F1
如果变量描述(“a href”标题属性)中有关键字,例如“音乐”,我想遍历所有变量(每个变量都在“ul”标签中)并提取变量名称(在本例中为“V7551”) .
我正在尝试自动化我正在做的任务,不幸的是没有网络抓取经验。有人对如何进行有任何提示吗?谢谢!!
解决方案
Based on your information, understood that you want to get the "variable" of each a
in the navigation, that´s title
attribute contains one or more keywords - Providing some of your written code could help to find a better solution.
Solution could be
define your list of keywords:
matches = ["music", "video", "phone"]
loop over all a
tags and check if there is a match in title
of the a
:
for link in soup.find_all("a"):
if any(x in link.get('title') for x in matches if link.get('title')):
print(link.get('name'))
Output example
V7310
V7539
V7588
V7551
V7553
V7590
V7562
V7563
推荐阅读
- javascript - jQuery DataTable 不能使用负索引过滤
- postman - Postman mockserver 添加示例
- ionic-framework - Ionic cli 4 但离子项目在 5
- python - 为什么我不能创建一个名为 CON 的目录?
- kubernetes - 气流任务卡在排队状态
- reactjs - 为什么在第一次渲染中状态为空 - Reactjs
- encryption - Netsparker 检测到弱密码已启用。尽管没有 SSL 加密,这可能吗?
- python - 使用 python 进行动态 mongo 查询
- node.js - localhost 在 NodeJS 和 Express-session 中重定向了太多次
- python - Kivy screenmanager:超时后切换屏幕与信号