首页 > 解决方案 > 将 tbody 转换为没有标题的 Dataframe

问题描述

我正在将requests库用于find表格,我想做的只是将tbody表格转换为带有编号列(0、1、2、3 ...)的数据框,因为表格的标题由导航按钮和其他pandas.read_html()在阅读整个表格元素时引起悲伤的东西组成。

例如,使用此链接,我想将 stats 表放入数据框中。我现在设置的方式如下所示:

    soup = bs(requests.get(url).text, 'lxml')
    table = soup.findAll('table', {'class':'rgMasterTable'})[0]

    column_names = table.findAll('th', {'scope':'col'})
    column_names = [col_name.text for col_name in column_names]
    
    table_body = table.find('tbody')

    df = pd.read_html(table_body)

...但我知道这不是正确的方法。我将数据收集到数据框中后存储要使用的列名。有什么建议么?谢谢!

标签: pythonpandaspython-requests

解决方案


对于给出的链接,如果这是正确的表......

也许:

import requests
import pandas as pd

url = 'https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2014&month=1000&season1=2014&ind=0&team=2&rost=0&age=0&filter=&players=0&startdate=2014-04-06&enddate=2014-04-06&sort=3,d&page=1_50'

page = requests.get(url)
tables = pd.read_html(page.text)

# remove junk from bottom of required table and remove multiIndex
df = tables[16][:-1]
df.columns = df.columns.droplevel()

print(df)


     #               Name  G PA HR  R RBI SB    BB%     K%  ...   AVG   OBP  \
0    1        David Lough  1  5  0  0   0  0   0.0%   0.0%  ...  .000  .000   
1    2      Nick Markakis  1  4  0  1   0  0   0.0%   0.0%  ...  .500  .500   
2    3       Matt Wieters  1  4  1  1   1  0   0.0%   0.0%  ...  .250  .250   
3    4        Nelson Cruz  1  4  0  0   1  0  25.0%  25.0%  ...  .333  .500   
4    5      Ryan Flaherty  1  4  0  0   0  0  25.0%   0.0%  ...  .333  .500   
5    6         Adam Jones  1  4  0  1   1  0   0.0%  25.0%  ...  .333  .250   
6    7        Chris Davis  1  4  0  0   0  0   0.0%  25.0%  ...  .250  .250   
7    8  Steve Lombardozzi  1  4  0  0   0  0   0.0%  25.0%  ...  .250  .250   
8    9    Jonathan Schoop  1  4  0  0   0  0   0.0%  25.0%  ...  .000  .000   
9   10       Tommy Hunter  1  0  0  0   0  0   0.0%   0.0%  ...  .000  .000   
10  11      Chris Tillman  1  0  0  0   0  0   0.0%   0.0%  ...  .000  .000   

      SLG  wOBA xwOBA  wRC+  BsR   Off   Def   WAR  
0    .000  .000   NaN  -100  0.1  -1.1   0.1  -0.1  
1   1.000  .632   NaN   320  0.0   1.0   0.0   0.1  
2   1.000  .534   NaN   252  0.0   0.7   0.0   0.1  
3    .667  .493   NaN   223  0.0   0.5  -0.1   0.1  
4    .333  .395   NaN   155  0.0   0.3   0.0   0.0  
5    .667  .321   NaN   103  0.0   0.0   0.0   0.0  
6    .250  .223   NaN    35  0.0  -0.3   0.0   0.0  
7    .250  .223   NaN    35  0.0  -0.3   0.0   0.0  
8    .000  .000   NaN  -100  0.0  -1.0   0.1  -0.1  
9    .000  .000   NaN   NaN  0.0   0.0   0.0   0.0  
10   .000  .000   NaN   NaN  0.0   0.0   0.0   0.0 

推荐阅读