首页 > 解决方案 > 制作多行标题的熊猫数据框

问题描述

我尝试使用的熊猫数据框无法正确打印

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

year = 2021
url = "https://www.basketball-reference.com/leagues/NBA_{}_per_game.html".format(year)
html = urlopen(url)
soup = BeautifulSoup(html, features='html.parser')
soup.findAll('tr', limit=2)
headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
headers = headers[1:]
rows = soup.findAll('tr')[1:]
player_stats = [[td.getText() for td in rows[i].findAll('td')] for i in range(len(rows))]
stats = pd.DataFrame(player_stats, columns=headers)
stats.head(10)

with open('stats.txt', 'w') as f:
   f.write(str(stats)

在输出中,它放置了前几个标题和行。然后在完成所有行之后,它会执行下一组标题

标签: pythonpandas

解决方案


我会在这里进一步说明 Octav 的观点。不仅让 pandas 写入文件,还让它解析表。

import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

year = 2021
url = "https://www.basketball-reference.com/leagues/NBA_{}_per_game.html".format(year)
stats = pd.read_html(url)[0]
stats = stats[stats['Rk'].ne('Rk')] #<-- removes rows with the "headers"

stats.head(10)

stats.to_csv('stats.csv', index=False)

推荐阅读