python - 从网页中抓取数据
问题描述
我正在尝试从以下网页中抓取数据 https://www.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011 -12 我需要表格格式的记分牌。谁能帮我吗?我正在使用python3。我是网络抓取的新手,对网页的内部结构不太熟悉。提前致谢!
我尝试将 BeautifulSoup 与 urllib2 等一起使用,但没有到达任何地方。
解决方案
你可以使用熊猫' read_html()
. 这将返回一个数据框列表。从那里你用它做什么取决于你。您可能需要对数据进行一些整理,但我只是将它们转储到一张大表中以向您展示。
import pandas as pd
url = 'https://m.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12'
dfs = pd.read_html(url)
result = pd.concat( [ df for df in dfs ] )
输出:
print (result.to_string())
0 1 2 3 4
0 Batting R B 4s 6s
0 Ed Cowan 68 177 7 0
1 c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin
0 David Warner 37 49 4 1
1 c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav
0 Shaun Marsh 0 6 0 0
1 c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav
0 Ricky Ponting 62 94 6 0
1 c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav
0 Michael Clarke 31 68 5 0
1 b Z Khan b Z Khan b Z Khan b Z Khan b Z Khan
0 Michael Hussey 0 1 0 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 Brad Haddin 27 69 1 0
1 c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan
0 Peter Siddle 41 100 4 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 James Pattinson 18 54 2 0
1 not out not out not out not out not out
0 Ben Hilfenhaus 19 32 3 0
1 c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin
0 Nathan Lyon 6 11 1 0
1 b R Ashwin b R Ashwin b R Ashwin b R Ashwin b R Ashwin
0 Bowler O M R W
1 Zaheer Khan 31 6 77 4
2 Ishant Sharma 24 7 48 0
3 Umesh Yadav 26 5 106 3
4 Ravichandran Ashwin 29 3 81 3
0 Home Live Scores NaN NaN NaN
1 Schedule News NaN NaN NaN
2 Editorials Photos NaN NaN NaN
3 Archives Players NaN NaN NaN
4 Rankings Series NaN NaN NaN
5 Poll Videos NaN NaN NaN
6 Points Table Contact Us NaN NaN NaN
7 Cricbuzz TV Ads Careers @ Cricbuzz NaN NaN NaN
8 Mobile Apps This day that year NaN NaN NaN
9 Wickets Zone NaN NaN NaN NaN
0 Mobile Apps Social Channels NaN NaN NaN
1 iPhone facebook NaN NaN NaN
2 Android twitter NaN NaN NaN
推荐阅读
- python - 在设置相等python中使用自定义比较函数
- postgresql - 为什么我的 Sequelize 查询这么慢?(使用 findAll 的嵌套连接)
- sql - 如何在不更改 SQL 中的 Where 子句的情况下获取某个日期范围内的列数
- acfpro - acf_register_block_type 没有出现在首页
- java - Swing BoxLayout 组件未排列
- javascript - 使用 React Hooks + Context API 的全局状态模式在移动设备上不起作用(iOS)
- python - 无法导入 mysql 连接器
- c++ - 为什么 C++ 动态内存删除不起作用?
- python - 在 VS Code 的“选择语言模式”中找不到 django-html
- c# - 如何在 Tizen C# (Weareble) 中创建套接字连接