首页 > 解决方案 > 如何通过python从HTML表中按颜色删除列

问题描述

我有下面的 html 表,我想删除所有没有背景的列:白色单元格

soup = BeautifulSoup(htmlbody, 'html.parser')
table1 = soup.find_all('table')[3]
print(table1)

<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="background:#F6F6F6;border-collapse:collapse"><tr style="vertical-align:center"><td style="border:solid windowtext 1.0pt;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>KPI ↓&lt;o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ch<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>eg<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>fr<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>om<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>sa<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>sa m<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>tr<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>uk<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ukv<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ust<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>avg<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Update1<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Web1<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Update2<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr></table>

标签: htmlpython-3.x

解决方案


试试这个来删除具有这种风格的 tds:

for t in table1.findAll("td"):
    if "background:white" in t["style"]:
        t.decompose()

要删除 tds 具有该样式的列,您可以使用 BeautifulSoup 和 pandas 的组合:

from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(htmlbody, 'html.parser')
table1 = soup.find_all('table')[3]

cols = set([])
for tr in table1.findAll("tr"):
    for i, td in enumerate(tr.findAll("td")):
        if "background:white" in td["style"]:
            cols.add(i)

df = pd.read_html(table1)
df = df[0]
df.drop(df.columns[list(cols)], axis=1, inplace=True)

希望我已理解您的要求。


推荐阅读