首页 > 解决方案 > ValueError:通过了 9 列,传递的数据有 3 列

问题描述

我刚开始学习网络抓取,30 分钟后我在从 wiki 抓取表格时遇到了一个问题。

import requests
from bs4 import BeautifulSoup
import pandas as pd

start_url = 'https://en.wikipedia.org/wiki/The_Avengers_(2012_film)#Sequels'

downloaded_html = requests.get(start_url)

soup = BeautifulSoup(downloaded_html.text)

with open('downloaded.html', 'w', encoding="utf-8") as file:
file.write(soup.prettify())

full_table = soup.select('table.wikitable tbody')[0]

table_head = full_table.select('tr th')

tabele_column = []
for element in table_head:
    colume_label = element.get_text(separator=" ", strip=True)
    colume_label = colume_label.replace(" ", "_")
    tabele_column.append(colume_label)

table_row = full_table.select('tr')
table_data = []
for index, element in enumerate(table_row):
    if index > 0:
        row_list = []
        values = element.select('td')
        for value in values:
            row_list.append(value.text.strip())
        table_data.append(row_list)
# print(table_data)

df = pd.DataFrame(table_data, columns=colume_label)
print(df)

我收到以下错误

ValueError:通过了 9 列,传递的数据有 3 列

标签: pythonpandasweb-scraping

解决方案


我怀疑您使用colume_label而不是tabele_column 构建数据框

df = pd.DataFrame(table_data, columns=tabele_column)
print(df)
#                                          Record_title                          Record_detail   Reference
# 0                        Opening weekend for any film                           $207,438,708       [212]
# 1                           Opening week for any film                           $270,019,373       [213]
# 2        Opening weekend, adjusted for ticket pricing                         $207.4 million       [214]
# 3                      Theater average – wide release                                $47,698       [206]
# 4                     3D gross during opening weekend                           $108 million  [198][203]
# 5                   IMAX gross during opening weekend                          $15.3 million       [200]
# 6                         Second weekend for any film                           $103,052,274       [215]
# 7                  Monthly share of domestic earnings                          May 2012, 52%       [211]
# 8                            Highest cumulative gross                            2 – 43 days       [216]
# 9                   Days to reach $100*, $150 million                                2 days*       [217]
# 10  Days to reach $200, $250, $300, $350, $400, $4...  3, 6, 9, 10, 14, 17 days respectively       [217]
# 11                   Days to reach $500, $550 million                            23, 31 days  [208][217]
# 12                                        May opening                           $207,438,708       [218]
# 13               Opening weekend for a superhero film                           $207,438,708       [219]
# 14                    Highest-grossing superhero film                           $623,357,910       [220]

推荐阅读