首页 > 解决方案 > 从 beautifulsoup 中提取 HTML 表格单元格文本

问题描述

我想提取基于文本“出生地”的“加拿大”。如何做到这一点beautifulsoup

<html>
    <table class="table1">
        <tbody>
            <tr>
                <td>Date(s) of Birth Used</td>
                <td>May 14, 1942</td>
            </tr>
            <tr>
                <td>Place of Birth</td>
                <td>Canada</td>
            </tr>
        </tbody>
    </table>
</html>

标签: pythonbeautifulsoup

解决方案


你应该试试这个动态 td 值

from bs4 import BeautifulSoup

contents = '''<html>
                <table class="table1">
                    <tbody>
                        <tr>
                            <td>Date(s) of Birth Used</td>
                            <td>May 14, 1942</td>
                        </tr>
                        <tr>
                            <td>Place of Birth</td>
                            <td>Canada</td>
                        </tr>
                    </tbody>
                </table>
            </html>'''

soup = BeautifulSoup(contents, 'html.parser')
table_div = soup.find(class_ = "table1")
td_val = table_div.findAll('td')
updated_td_val = list(map(str, td_val))

# You can use input() instead of '<td>Place of Birth</td>' to take dynamic input and on basis of that input, it will return you the content of input td and it's next td.

if updated_td_val and '<td>Place of Birth</td>' in updated_td_val:
    index_val = updated_td_val.index('<td>Place of Birth</td>')
    print(td_val[index_val].get_text())
    print(td_val[index_val+1].get_text())

输出 :

Place of Birth
Canada

推荐阅读