首页 > 解决方案 > BeautifulSoup 在 Python 中抓取带有和不带有 ID 的表

问题描述

我正在尝试抓取网站,它们都有表格。但是,第一个 url 有一个名为的表 ID .table-translations,而另一个没有 ID,因此它不会抓取。

但如果我不包括它,它就不会爬行。

如何使用 BeautifulSoup 抓取有和没有表 ID 的数据?

下面是我的代码

import requests
from bs4 import BeautifulSoup


urls = ['http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars', 'http://www.mongols.eu/mongolian-language/mongolian-tale-yanzin-jaal']

for url in urls:
        print(url)
        out_fileName = url.rsplit('/', 1)[-1]
        out_mn = out_fileName + "_mn.txt"
        out_en = out_fileName + "_en.txt"

        soup = BeautifulSoup(requests.get(url).content, 'html.parser')

        all_data = []
        for row in soup.select('.table-translations tr')[1:]:
                mongolian, english = map(lambda t: t.get_text(strip=True), row.select('td')[1:])
                all_data.append((mongolian, english))

        for row in all_data:
                with open(out_mn, "a") as text_file:
                        text_file.write(row[0] + "\n")
                with open(out_en, "a") as text_file:
                        text_file.write(row[1] + "\n")

标签: pythonpython-3.xbeautifulsoup

解决方案


此脚本将从这两个 URL 获取所有翻译。但如果还有其他结构不同的页面,则需要调整:

import requests
from bs4 import BeautifulSoup


urls = ['http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars', 'http://www.mongols.eu/mongolian-language/mongolian-tale-yanzin-jaal']

for url in urls:
    print(url)

    soup = BeautifulSoup(requests.get(url).content, 'html.parser')

    all_data = []
    for row in soup.select('tr')[1:]:
        tds = [*map(lambda t: t.get_text(strip=True), row.select('td'))]
        if len(tds) == 3:
            mongolian, english = map(lambda t: t.get_text(strip=True), row.select('td')[1:])
        else:
            mongolian, english = map(lambda t: t.get_text(strip=True), row.select('td'))

        print(mongolian)
        print(english)
        print('-' * 80)
        all_data.append((mongolian, english))

印刷:

http://www.mongols.eu/mongolian-language/mongolian-tale-six-silver-stars
Зургаан мөнгөн мичид
Six silver stars
--------------------------------------------------------------------------------
Эрт урьд цагт зургаан өнчин хүүхэд товцог толгой дээр наадан суудаг юм санжээ.
Long ago, there were six orphan brothers playing on the top of a hill.
--------------------------------------------------------------------------------

... and so on.

推荐阅读