首页 > 解决方案 > 在 Python 中对 BeautifulSoup 使用 FOR 循环和 IF

问题描述

我正在尝试提取一些网页的元描述。下面是我的代码:

URL_List = ['https://digisapient.com', 'https://dataquest.io']
Meta_Description = []

for url in URL_List:
    response = requests.get(url, headers=headers)
    #lower_response_text = response.text.lower()
    soup = BeautifulSoup(response.text, 'lxml')
    metas = soup.find_all('meta')
    for m in metas:
        if m.get ('name') == 'description':
            desc = m.get('content')
            Meta_Description.append(desc)
        else:
            desc = "Not Found"
            Meta_Description.append(desc)

现在这将返回给我以下内容:

['Not Found',
 'Not Found',
 'Not Found',
 'Not Found',
 'Learn Python, R, and SQL skills. Follow career paths to become a job-qualified data scientist, analyst, or engineer with interactive data science courses!',
 'Not Found',
 'Not Found',
 'Not Found',
 'Not Found']

我想把content元数据拉到哪里name == 'description'。如果条件不匹配,即页面没有元属性,name == 'description它应该返回Not Found

预期输出:

['Not Found',
 'Learn Python, R, and SQL skills. Follow career paths to become a job-qualified data scientist, analyst, or engineer with interactive data science courses!']

请建议。

标签: python-3.xfor-loopif-statementbeautifulsoup

解决方案


让我知道这是否适合您!

URL_List = ['https://digisapient.com', 'https://dataquest.io']
Meta_Description = []
meta_flag = False

for url in URL_List:
    response = requests.get(url, headers=headers)
    meta_flag = False
    #lower_response_text = response.text.lower()
    soup = BeautifulSoup(response.text, 'lxml')
    metas = soup.find_all('meta')
    for m in metas:
        if m.get ('name') == 'description':
            desc = m.get('content')
            Meta_Description.append(desc)
            meta_flag = True
            continue
    if not meta_flag:
        desc = "Not Found"
        Meta_Description.append(desc)

代码背后的想法是,它将遍历 中的所有项目metas,如果找到“描述”,它将将该标志设置为 True,从而跳过后续的 if 语句。如果在迭代之后metas没有找到任何东西,它会将“未找到”附加到Meta_Description.


推荐阅读