首页 > 解决方案 > 在刮表的第一列添加标题

问题描述

我目前正在从事一个学校项目,我正在从一个自行车网站上抓取结果。我设法构建了爬虫来遍历包含结果的所有 url。我想将活动标题添加到每个表格的第一列,但遇到了一些困难。

这是我的代码:

# list of needed packages
import requests
from bs4 import BeautifulSoup
import time
import csv

# create string of urls to scrape
urls = ['https://cqranking.com/men/asp/gen/race.asp?raceid=36151', 'https://cqranking.com/men/asp/gen/race.asp?raceid=36151']

# Generates a csv-file named cycling_results.csv, with wanted headers
with open('cycling_results.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=';')
    writer.writerow(['Start', 'Rank', '', '', '', 'Name', '', 'Team', '', 'Time', '', 'Points'])

    # loop through all urls in the array
    for url in urls:
        time.sleep(2)
        response = requests.get(url)
        data = response.content
        soup = BeautifulSoup(data, 'html.parser')
        # Find the title of the racing event
        titles = soup.find('title')
        for title in titles:
            writer.writerow(title)
        tables = soup.find_all('table')
        for table in tables:
            rows = table.find_all('tr')
            for row in rows:
                csv_row = []
                columns = row.find_all('td')
                for column in columns:
                    csv_row.append(column.get_text())
                writer.writerow(csv_row)

在下一个阶段,我将添加代码以删除空行。

谢谢问候凯文

标签: pythonbeautifulsoupexport-to-csv

解决方案


这段代码应该是

titles = soup.find('title')
for title in titles:
    writer.writerow(title)

---->

titles = soup.find('title')
writer.writerow([title.text])

find 只返回一个元素,而不是元素列表。写元素文本或你想要的信息,但不是完整的元素


推荐阅读