首页 > 解决方案 > 如何使用 beautifulsoup 获取 A-Tag 的属性值 - 基于条件:标题字符串包含关键字

问题描述

我从这里的网站有看起来像这样的 html 对象

<li id="V7551"><a href="/childcare/studies/36149/datasets/1/sdaxml/variable?var=V7551" name="V7551" target="new" title='Item number: 31680  F1

Not counting work for school or a job, about how many hours a week do you spend on the Internet e-mailing, instant
messaging, gaming, shopping, searching, downloading music, etc.?

1="None" 2="Less than 1 hour" 3="1-2 hours" 4="3-5 hours" 5="6-9 hours" 6="10-19 hours" 7="20-29 hours" 
8="30-39 hours" 9="40 or more"'><span class="select-varname">V7551</span> 2014 C09 #HR/W INTERNET S F1</a></li>
<li id="V7553"><a href="/childcare/studies/36149/datasets/1/sdaxml/variable?var=V7553" name="V7553" target="new" title='Item number: 31990  F1

如果变量描述(“a href”标题属性)中有关键字,例如“音乐”,我想遍历所有变量(每个变量都在“ul”标签中)并提取变量名称(在本例中为“V7551”) .

我正在尝试自动化我正在做的任务,不幸的是没有网络抓取经验。有人对如何进行有任何提示吗?谢谢!!

标签: pythonhtmlweb-scrapingbeautifulsoup

解决方案


Based on your information, understood that you want to get the "variable" of each a in the navigation, that´s title attribute contains one or more keywords - Providing some of your written code could help to find a better solution.

Solution could be

define your list of keywords:

matches = ["music", "video", "phone"]

loop over all a tags and check if there is a match in title of the a:

for link in soup.find_all("a"):
    if any(x in link.get('title') for x in matches if link.get('title')):
        print(link.get('name'))

Output example

V7310
V7539
V7588
V7551
V7553
V7590
V7562
V7563

推荐阅读