首页 > 解决方案 > 如何使用python解析表?

问题描述

我正在尝试解析表格;我将表的每一行都编入了table_rows[0-8]. 我无法弄清楚如何将所有内容与值分开。我要抓取的页面是一个内部工作站点,但这是我要抓取的表格。

代码:

options = webdriver.ChromeOptions()
options.add_argument('headless')

driver = 
webdriver.Chrome(r'C:\Users\wendle\BrowserDrivers\chromedriver.exe', 
options=options)
driver.get(wsr)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
soup.prettify()
table = soup.find_all('table')

table_rows = table[2].find_all('tr')

<table border="0" bordercolor="black" cellspacing="0" cellpadding="1" bgcolor="white" style="border-collapse:collapse"><tbody><tr><td colspan="5" bgcolor="black" valign="top"><font face="arial" size="3" color="white"><b>DIFFUSION</b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="../scrape/scrape_area.php?area=DIFFUSION"><font face="arial" size="1" color="#FFAAAA">SCRAPE</font></a></font></td><td colspan="2" bgcolor="black"><font face="arial" size="2" color="#888888">&nbsp;</font></td></tr><tr style="background-color:black; color:#888888;"><th valign="top"><font face="arial" size="2"><a class="groupheader" href="downtools.php?orderby=toolid&amp;noboth=1">ToolId</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=type&amp;noboth=1">Type</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=status&amp;noboth=1">Status</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=datetime&amp;noboth=1">Date/Time</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=datetime&amp;noboth=1">Min</a>
</font></th><th valign="top"><font face="arial" size="2"><a href="downtools.php?orderby=employee&amp;noboth=1">Employee</a>
</font></th><th valign="top"><font face="arial" size="2">Comments
</font></th></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2372">2372</a></font></td><td width="200" valign="top"><font face="arial" size="2">CHANNEL</font></td><td width="60" valign="top"><font face="arial" size="2">PTST    </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 00:52</font></td><td width="50" valign="top"><font face="arial" size="2">75</font></td><td width="150" valign="top"><font face="arial" size="2">A*****A C******L            </font></td><td width="600" valign="top"><font face="arial" size="2">Thickt5 moniotr i/p...fn9818 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2619">2619</a></font></td><td width="200" valign="top"><font face="arial" size="2">CHANNEL</font></td><td width="60" valign="top"><font face="arial" size="2">PTST    </font></td><td width="120" valign="top"><font face="arial" size="2">08-29-19 23:18</font></td><td width="50" valign="top"><font face="arial" size="2">169</font></td><td width="150" valign="top"><font face="arial" size="2">A******A C******L            </font></td><td width="600" valign="top"><font face="arial" size="2">Thickt5 monitor i/p...fn9818 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2349">2349</a></font></td><td width="200" valign="top"><font face="arial" size="2">GATE OX</font></td><td width="60" valign="top"><font face="arial" size="2">PMTST   </font></td><td width="120" valign="top"><font face="arial" size="2">08-29-19 23:50</font></td><td width="50" valign="top"><font face="arial" size="2">137</font></td><td width="150" valign="top"><font face="arial" size="2">****S W*****                  </font></td><td width="600" valign="top"><font face="arial" size="2">Lvl1001 i\p completion@0450 </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=3216">3216</a></font></td><td width="200" valign="top"><font face="arial" size="2">LTO  DEP</font></td><td width="60" valign="top"><font face="arial" size="2">PDT     </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 01:46</font></td><td width="50" valign="top"><font face="arial" size="2">21</font></td><td width="150" valign="top"><font face="arial" size="2">**N ****S                     </font></td><td width="600" valign="top"><font face="arial" size="2">Cold ror is 4.1mt ****************************** temping/purging ****************************** </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=2473">2473</a></font></td><td width="200" valign="top"><font face="arial" size="2">SOURCE DR</font></td><td width="60" valign="top"><font face="arial" size="2">PTST    </font></td><td width="120" valign="top"><font face="arial" size="2">08-30-19 01:07</font></td><td width="50" valign="top"><font face="arial" size="2">60</font></td><td width="150" valign="top"><font face="arial" size="2">R**** A*****                </font></td><td width="600" valign="top"><font face="arial" size="2">Particle i/p... </font></td></tr><tr><td width="50" valign="top"><font face="arial" size="2"><a href="toolhist.php?tool=3531">3531</a></font></td><td width="200" valign="top"><font face="arial" size="2">TRANSFER - FIELD OX</font></td><td width="60" valign="top"><font face="arial" size="2">AP      </font></td><td width="120" valign="top"><font face="arial" size="2">08-28-19 15:27</font></td><td width="50" valign="top"><font face="arial" size="2">2079</font></td><td width="150" valign="top"><font face="arial" size="2">M***** C*****            </font></td><td width="600" valign="top"><font face="arial" size="2">Keyboard has been shipped to the factory - will update by end of week. </font></td></tr></tbody></table>

标签: pythonpython-3.x

解决方案


使用熊猫read_html

import pandasa as pd
df = pd.read_html(table_rows, header=[0, 1])[0]

我还注意到您有一些标题列,因此您需要 header 参数。


推荐阅读