首页 > 解决方案 > Pandas,BeautifulSoup - 迭代和编写多个页面以实现卓越

问题描述

我正在收集一堆 NCAA 足球统计数据并将它们转储到 Excel 电子表格中。然而,赢/输/领带数据 (WLT) 跨越多个页面,所以我遍历它们。但是 WLT 只将迭代的最后一页(204 所学校中的 4 所学校)存储到 excel 中。如何在 Excel 的“WLT”表中下载 5 页?谢谢你的帮助....


    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    import re
    import xlsxwriter
    import numpy as np
    import urllib.request


    shutouts = "https://www.ncaa.com/stats/soccer-men/d1/current/team/31"
    shutouts = pd.read_html(shutouts)[0] 

    SOG = 'https://www.ncaa.com/stats/soccer-men/d1/current/team/977'
    SOG = pd.read_html(SOG)[0]

    # players stats
    shutouts_p = 'https://www.ncaa.com/stats/soccer-men/d1/current/individual/1170'
    shutouts_p = pd.read_html(shutouts_p)[0]

    #Win Loss Tie data
    max_page_num = 6
    for i in range(1,max_page_num):  
        print('page:', i)
        page_num = str(i)
        source = "https://www.ncaa.com/stats/soccer-men/d1/current/team/33/p" + page_num
        WLT = pd.read_html(source)
        WLT = WLT[0]


    with pd.ExcelWriter('ncaastats.xlsx') as writer:  
        shutouts.to_excel(writer, sheet_name='shutouts')
        shutouts_p.to_excel(writer, sheet_name='shutouts_p')
        SOG.to_excel(writer, sheet_name='SOG')
        WLT.to_excel(writer, sheet_name='WLT')

标签: python-3.xpandasbeautifulsoup

解决方案


从 pandas 的 5 页中获取所有 204 条记录dataframe。您需要df在每个中附加iteration

代码

import pandas as pd

#declare df here
df=pd.DataFrame()
#Win Loss Tie data
max_page_num = 6
for i in range(1,max_page_num):
    print('page:', i)
    page_num = str(i)
    source = "https://www.ncaa.com/stats/soccer-men/d1/current/team/33/p" + page_num
    WLT = pd.read_html(source)[0]
    #Append df here
    df = df.append(WLT, ignore_index=True)

print(df)

输出

page: 1
page: 2
page: 3
page: 4
page: 5
    Rank                Team  Won  Loss  Tied   Pct.
0      1        Missouri St.   18     1     1  0.925
1      2          Georgetown   20     1     3  0.896
2      -            Virginia   21     2     1  0.896
3      4   Saint Mary's (CA)   16     2     0  0.889
4      5                 SMU   18     2     1  0.881
5      6             Clemson   18     2     2  0.864
6      7       New Hampshire   15     2     3  0.825
7      8            Campbell   17     3     2  0.818
8      9          Washington   17     4     0  0.810
9     10                 UCF   15     3     2  0.800
10    11            Marshall   16     3     3  0.795
11    12           Seattle U   16     3     4  0.783
12    13                Yale   13     3     2  0.778
13    14             Indiana   15     3     4  0.773
14    15        Oral Roberts   13     4     0  0.765
15    16            Stanford   14     3     5  0.750
16    17         Wake Forest   16     5     2  0.739
17    18        Rhode Island   14     4     3  0.738
18    19                Navy   12     4     1  0.735
19    20     St. John's (NY)   14     5     1  0.725
20    21                 UIC   13     5     0  0.722
21    22            Penn St.   12     4     3  0.711
22    23    UC Santa Barbara   15     5     4  0.708
23    24            UC Davis   13     5     2  0.700
24     -           Charlotte   12     4     4  0.700
25     -         Georgia St.   12     4     4  0.700
26    27          Providence   16     7     0  0.696
27    28           San Diego   12     5     1  0.694
28     -                 FIU   10     3     5  0.694
29    30                Iona   14     6     1  0.690
..   ...                 ...  ...   ...   ...    ...
174  175            Delaware    3     9     3  0.300
175  176         USC Upstate    5    12     0  0.294
176    -       Robert Morris    4    11     2  0.294
177    -         Stony Brook    4    11     2  0.294
178    -                 UIW    5    12     0  0.294
179  180        Western Ill.    5    13     1  0.289
180  181           Wisconsin    3    11     4  0.278
181    -             Liberty    5    13     0  0.278
182    -       San Diego St.    4    12     2  0.278
183  184           Boston U.    4    12     1  0.265
184    -       UNC Asheville    4    12     1  0.265
185  186             Wofford    4    13     1  0.250
186    -          Valparaiso    4    13     1  0.250
187    -            American    3    11     2  0.250
188    -        George Mason    4    13     1  0.250
189    -            Davidson    3    11     2  0.250
190    -        Michigan St.    3    12     3  0.250
191  192            Monmouth    3    12     2  0.235
192    -                 UAB    3    12     2  0.235
193  194        Old Dominion    3    11     1  0.233
194  195        Sacred Heart    2    11     3  0.219
195  196  Col. of Charleston    2    12     2  0.188
196  197          Holy Cross    3    15     0  0.167
197    -   Purdue Fort Wayne    3    15     0  0.167
198  199       San Francisco    2    14     1  0.147
199  200          Evansville    2    15     1  0.139
200  201            Canisius    2    15     0  0.118
201  202   Central Conn. St.    1    13     1  0.100
202  203                 VMI    1    16     0  0.059
203  204             Harvard    0    14     1  0.033

[204 rows x 6 columns]

推荐阅读