python - 如何使用python解析表?
问题描述
我正在尝试解析表格;我将表的每一行都编入了table_rows[0-8]
. 我无法弄清楚如何将所有内容与值分开。我要抓取的页面是一个内部工作站点,但这是我要抓取的表格。
代码:
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver =
webdriver.Chrome(r'C:\Users\wendle\BrowserDrivers\chromedriver.exe',
options=options)
driver.get(wsr)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
soup.prettify()
table = soup.find_all('table')
table_rows = table[2].find_all('tr')
<table border="0" bordercolor="black" cellspacing="0" cellpadding="1" bgcolor="white" style="border-collapse:collapse"><tbody><tr><td colspan="5" bgcolor="black" valign="top"><font face="arial" size="3" color="white"><b>DIFFUSION</b> <a href="../scrape/scrape_area.php?area=DIFFUSION"><font face="arial" size="1" color="#FFAAAA">SCRAPE</font></a></font></td><td colspan="2" bgcolor="black"><font face="arial" size="2" color="#888888"> </font></td></tr><tr style="background-color:black; color:#888888;"><th valign="top"><font face="arial" size="2"><a class="groupheader" href="downtools.php?orderby=toolid&noboth=1">ToolId</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=type&noboth=1">Type</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=status&noboth=1">Status</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=datetime&noboth=1">Date/Time</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=datetime&noboth=1">Min</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=employee&noboth=1">Employee</a>
</font></th><th valign="top"><font face="arial" size="2">Comments
</font></th></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2372">2372</a></font></td><td width="200" valign="top"><font face="arial" size="2">CHANNEL</font></td><td width="60" valign="top"><font face="arial" size="2">PTST </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 00:52</font></td><td width="50" valign="top"><font face="arial" size="2">75</font></td><td width="150" valign="top"><font face="arial" size="2">A*****A C******L </font></td><td width="600" valign="top"><font face="arial" size="2">Thickt5 moniotr i/p...fn9818 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2619">2619</a></font></td><td width="200" valign="top"><font face="arial" size="2">CHANNEL</font></td><td width="60" valign="top"><font face="arial" size="2">PTST </font></td><td width="120" valign="top"><font face="arial" size="2">08-29-19 23:18</font></td><td width="50" valign="top"><font face="arial" size="2">169</font></td><td width="150" valign="top"><font face="arial" size="2">A******A C******L </font></td><td width="600" valign="top"><font face="arial" size="2">Thickt5 monitor i/p...fn9818 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2349">2349</a></font></td><td width="200" valign="top"><font face="arial" size="2">GATE OX</font></td><td width="60" valign="top"><font face="arial" size="2">PMTST </font></td><td width="120" valign="top"><font face="arial" size="2">08-29-19 23:50</font></td><td width="50" valign="top"><font face="arial" size="2">137</font></td><td width="150" valign="top"><font face="arial" size="2">****S W***** </font></td><td width="600" valign="top"><font face="arial" size="2">Lvl1001 i\p completion@0450 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=3216">3216</a></font></td><td width="200" valign="top"><font face="arial" size="2">LTO DEP</font></td><td width="60" valign="top"><font face="arial" size="2">PDT </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 01:46</font></td><td width="50" valign="top"><font face="arial" size="2">21</font></td><td width="150" valign="top"><font face="arial" size="2">**N ****S </font></td><td width="600" valign="top"><font face="arial" size="2">Cold ror is 4.1mt ****************************** temping/purging ****************************** </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2473">2473</a></font></td><td width="200" valign="top"><font face="arial" size="2">SOURCE DR</font></td><td width="60" valign="top"><font face="arial" size="2">PTST </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 01:07</font></td><td width="50" valign="top"><font face="arial" size="2">60</font></td><td width="150" valign="top"><font face="arial" size="2">R**** A***** </font></td><td width="600" valign="top"><font face="arial" size="2">Particle i/p... </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=3531">3531</a></font></td><td width="200" valign="top"><font face="arial" size="2">TRANSFER - FIELD OX</font></td><td width="60" valign="top"><font face="arial" size="2">AP </font></td><td width="120" valign="top"><font face="arial" size="2">08-28-19 15:27</font></td><td width="50" valign="top"><font face="arial" size="2">2079</font></td><td width="150" valign="top"><font face="arial" size="2">M***** C***** </font></td><td width="600" valign="top"><font face="arial" size="2">Keyboard has been shipped to the factory - will update by end of week. </font></td></tr></tbody></table>
解决方案
使用熊猫read_html:
import pandasa as pd
df = pd.read_html(table_rows, header=[0, 1])[0]
我还注意到您有一些标题列,因此您需要 header 参数。
推荐阅读
- pandas - 有什么方法可以在顺序日历中填写销售记录的缺失行
- windows - 如何区分 Windows HID API 中的设备?
- r - 在调用环境中评估函数
- java - Java 代码在从变量引用时有效,但在直接调用时抛出错误
- firebase - Firebase 云调度程序功能成本
- css - 为什么 mask-image 属性在浏览器中不起作用?
- python - -bash: fork: 无法分配内存 ubuntu 18.04.3
- node.js - 注册时不需要 Schema.Types.ObjectId
- ldap - 使用组的 openvpn LDAP 身份验证
- c# - 停止协程或更改 while 循环条件会冻结游戏