python - ValueError:通过了 9 列,传递的数据有 3 列
问题描述
我刚开始学习网络抓取,30 分钟后我在从 wiki 抓取表格时遇到了一个问题。
import requests
from bs4 import BeautifulSoup
import pandas as pd
start_url = 'https://en.wikipedia.org/wiki/The_Avengers_(2012_film)#Sequels'
downloaded_html = requests.get(start_url)
soup = BeautifulSoup(downloaded_html.text)
with open('downloaded.html', 'w', encoding="utf-8") as file:
file.write(soup.prettify())
full_table = soup.select('table.wikitable tbody')[0]
table_head = full_table.select('tr th')
tabele_column = []
for element in table_head:
colume_label = element.get_text(separator=" ", strip=True)
colume_label = colume_label.replace(" ", "_")
tabele_column.append(colume_label)
table_row = full_table.select('tr')
table_data = []
for index, element in enumerate(table_row):
if index > 0:
row_list = []
values = element.select('td')
for value in values:
row_list.append(value.text.strip())
table_data.append(row_list)
# print(table_data)
df = pd.DataFrame(table_data, columns=colume_label)
print(df)
我收到以下错误
ValueError:通过了 9 列,传递的数据有 3 列
解决方案
我怀疑您使用colume_label
而不是tabele_column
构建数据框:
df = pd.DataFrame(table_data, columns=tabele_column)
print(df)
# Record_title Record_detail Reference
# 0 Opening weekend for any film $207,438,708 [212]
# 1 Opening week for any film $270,019,373 [213]
# 2 Opening weekend, adjusted for ticket pricing $207.4 million [214]
# 3 Theater average – wide release $47,698 [206]
# 4 3D gross during opening weekend $108 million [198][203]
# 5 IMAX gross during opening weekend $15.3 million [200]
# 6 Second weekend for any film $103,052,274 [215]
# 7 Monthly share of domestic earnings May 2012, 52% [211]
# 8 Highest cumulative gross 2 – 43 days [216]
# 9 Days to reach $100*, $150 million 2 days* [217]
# 10 Days to reach $200, $250, $300, $350, $400, $4... 3, 6, 9, 10, 14, 17 days respectively [217]
# 11 Days to reach $500, $550 million 23, 31 days [208][217]
# 12 May opening $207,438,708 [218]
# 13 Opening weekend for a superhero film $207,438,708 [219]
# 14 Highest-grossing superhero film $623,357,910 [220]
推荐阅读
- javascript - 在 Rails 中,访问控制器变量/视图/javascript 中的参数
- c++ - 动态分配的数组值重置问题
- unity3d - 在 Unity 中设置动画循环结束点
- java - 如何为多个用户隔离 Jetty HttpClient?
- javascript - 如何将 Babel Standalone 与 Flow 一起使用
- csv - 如何为 CSV 文件中的逗号分隔列表格式化 CSV 文件
- c - 如何获得准确的输出“数学错误”?
- css - 使用 CSS3 渐变时出现“未知属性值”错误
- flutter - 使两个文本字段始终对齐
- python - 如何在 QT 5.6.1 中使用 QSortFilterProxyModel 执行递归过滤器搜索?