首页 > 解决方案 > 使用 Python 3/BeautifulSoup 将“a”包含在“strong”中时从 href 中检索 URL

问题描述

我在这方面有点麻烦。这是一个html示例:

    <tr data-row="8">
    <th scope="row" class="left " data-append-csv="abramjo01" data-stat="player">
        <a href="/players/a/abramjo01.html">John Abramovic</a>
    </th>
        <td class="right " data-stat="year_min">1947</td>
    <td class="right " data-stat="year_max">1948</td>
        <td class="center " data-stat="pos">F</td>
        <td class="right " data-stat="height" csk="75.0">6-3</td><td class="right " data-stat="weight">195</td>
        <td class="left " data-stat="birth_date" csk="19190209">
            <a href="/friv/birthdays.cgi?month=2&amp;day=9">February 9, 1919</a></td>
        <td class="left " data-stat="colleges">
            <a href="/friv/colleges.fcgi?college=salemintl">Salem International University</a></td>
</tr>

<tr data-row="9">
    <th scope="row" class="left " data-append-csv="abrinal01" data-stat="player">
        <strong><a href="/players/a/abrinal01.html">Álex Abrines</a></strong>
    </th><td class="right " data-stat="year_min">2017</td>
    <td class="right " data-stat="year_max">2019</td><td class="center " data-stat="pos">G-F</td>
    <td class="right " data-stat="height" csk="78.0">6-6</td>
    <td class="right " data-stat="weight">200</td>
    <td class="left " data-stat="birth_date" csk="19930801">
        <a href="/friv/birthdays.cgi?month=8&amp;day=1">August 1, 1993</a></td>
    <td class="left iz" data-stat="colleges"></td>
</tr>

在第一行中,没有任何东西有“强”标签,但第二行有一个“a”标签,它被包裹在“强”标签中。我希望能够仅获得该链接。结果将是:

/players/a/abrinal01.html

这张表中有很多行,所以我知道我会使用 find all,但不知道怎么说:

获取 ' href ' 仅当标签周围有强项时。

任何帮助,将不胜感激。

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


from bs4 import BeautifulSoup
data = """
    <tr data-row="8">
    <th scope="row" class="left " data-append-csv="abramjo01" data-stat="player">
        <a href="/players/a/abramjo01.html">John Abramovic</a>
    </th>
        <td class="right " data-stat="year_min">1947</td>
    <td class="right " data-stat="year_max">1948</td>
        <td class="center " data-stat="pos">F</td>
        <td class="right " data-stat="height" csk="75.0">6-3</td><td class="right " data-stat="weight">195</td>
        <td class="left " data-stat="birth_date" csk="19190209">
            <a href="/friv/birthdays.cgi?month=2&amp;day=9">February 9, 1919</a></td>
        <td class="left " data-stat="colleges">
            <a href="/friv/colleges.fcgi?college=salemintl">Salem International University</a></td>
</tr>

<tr data-row="9">
    <th scope="row" class="left " data-append-csv="abrinal01" data-stat="player">
        <strong><a href="/players/a/abrinal01.html">Álex Abrines</a></strong>
    </th><td class="right " data-stat="year_min">2017</td>
    <td class="right " data-stat="year_max">2019</td><td class="center " data-stat="pos">G-F</td>
    <td class="right " data-stat="height" csk="78.0">6-6</td>
    <td class="right " data-stat="weight">200</td>
    <td class="left " data-stat="birth_date" csk="19930801">
        <a href="/friv/birthdays.cgi?month=8&amp;day=1">August 1, 1993</a></td>
    <td class="left iz" data-stat="colleges"></td>
</tr>
"""

soup = BeautifulSoup(data, 'html.parser')

for item in soup.findAll('strong'):
    for a in item.findAll('a'):
        print(a.get('href'))

输出:

/players/a/abrinal01.html

推荐阅读