首页 > 解决方案 > 从网页中抓取数据

问题描述

我正在尝试从以下网页中抓取数据 https://www.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011 -12 我需要表格格式的记分牌。谁能帮我吗?我正在使用python3。我是网络抓取的新手,对网页的内部结构不太熟悉。提前致谢!

我尝试将 BeautifulSoup 与 urllib2 等一起使用,但没有到达任何地方。

标签: pythonweb-scraping

解决方案


你可以使用熊猫' read_html(). 这将返回一个数据框列表。从那里你用它做什么取决于你。您可能需要对数据进行一些整理,但我只是将它们转储到一张大表中以向您展示。

import pandas as pd

url = 'https://m.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12'
dfs = pd.read_html(url)

result =  pd.concat( [ df for df in dfs ] )

输出:

print (result.to_string())
                      0                     1                     2                     3                     4
0               Batting                     R                     B                    4s                    6s
0              Ed Cowan                    68                   177                     7                     0
1  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin
0          David Warner                    37                    49                     4                     1
1   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav
0           Shaun Marsh                     0                     6                     0                     0
1   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav
0         Ricky Ponting                    62                    94                     6                     0
1  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav
0        Michael Clarke                    31                    68                     5                     0
1              b Z Khan              b Z Khan              b Z Khan              b Z Khan              b Z Khan
0        Michael Hussey                     0                     1                     0                     0
1    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan
0           Brad Haddin                    27                    69                     1                     0
1   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan
0          Peter Siddle                    41                   100                     4                     0
1    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan
0       James Pattinson                    18                    54                     2                     0
1               not out               not out               not out               not out               not out
0        Ben Hilfenhaus                    19                    32                     3                     0
1  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin
0           Nathan Lyon                     6                    11                     1                     0
1            b R Ashwin            b R Ashwin            b R Ashwin            b R Ashwin            b R Ashwin
0                Bowler                     O                     M                     R                     W
1           Zaheer Khan                    31                     6                    77                     4
2         Ishant Sharma                    24                     7                    48                     0
3           Umesh Yadav                    26                     5                   106                     3
4   Ravichandran Ashwin                    29                     3                    81                     3
0                  Home           Live Scores                   NaN                   NaN                   NaN
1              Schedule                  News                   NaN                   NaN                   NaN
2            Editorials                Photos                   NaN                   NaN                   NaN
3              Archives               Players                   NaN                   NaN                   NaN
4              Rankings                Series                   NaN                   NaN                   NaN
5                  Poll                Videos                   NaN                   NaN                   NaN
6          Points Table            Contact Us                   NaN                   NaN                   NaN
7       Cricbuzz TV Ads    Careers @ Cricbuzz                   NaN                   NaN                   NaN
8           Mobile Apps    This day that year                   NaN                   NaN                   NaN
9          Wickets Zone                   NaN                   NaN                   NaN                   NaN
0           Mobile Apps       Social Channels                   NaN                   NaN                   NaN
1                iPhone              facebook                   NaN                   NaN                   NaN
2               Android               twitter                   NaN                   NaN                   NaN

推荐阅读