html - 如何通过python从HTML表中按颜色删除列
问题描述
我有下面的 html 表,我想删除所有没有背景的列:白色单元格
soup = BeautifulSoup(htmlbody, 'html.parser')
table1 = soup.find_all('table')[3]
print(table1)
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="background:#F6F6F6;border-collapse:collapse"><tr style="vertical-align:center"><td style="border:solid windowtext 1.0pt;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>KPI ↓<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ch<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>eg<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>fr<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>om<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>sa<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>sa m<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>tr<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>uk<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ukv<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>ust<o:p></o:p></span></p></td><td style="border:solid windowtext 1.0pt;border-left:none;background:white;padding:.75pt .75pt .75pt .75pt"><p align="center" style="text-align:center"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>avg<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Update1<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Web1<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr><tr><td style="border:solid windowtext 1.0pt;border-top:none;background:white;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Arial",sans-serif;color:black'>Update2<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>--<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td><td style="border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;background:#00CC00;padding:.75pt .75pt .75pt .75pt;vertical-align:center"><p align="right" class="MsoNormal" style="text-align:right"><span style='font-size:9.0pt;font-family:"Arial",sans-serif'>100<o:p></o:p></span></p></td></tr></table>
解决方案
试试这个来删除具有这种风格的 tds:
for t in table1.findAll("td"):
if "background:white" in t["style"]:
t.decompose()
要删除 tds 具有该样式的列,您可以使用 BeautifulSoup 和 pandas 的组合:
from bs4 import BeautifulSoup
import pandas as pd
soup = BeautifulSoup(htmlbody, 'html.parser')
table1 = soup.find_all('table')[3]
cols = set([])
for tr in table1.findAll("tr"):
for i, td in enumerate(tr.findAll("td")):
if "background:white" in td["style"]:
cols.add(i)
df = pd.read_html(table1)
df = df[0]
df.drop(df.columns[list(cols)], axis=1, inplace=True)
希望我已理解您的要求。
推荐阅读
- c# - 具有多个类构造函数的 C# Json 反序列化
- javascript - 可恢复块上传
- echarts - 如何使用 ECharts 的格式化程序来指定国际数字格式?
- pyspark - Pyspark从数据框中的整数中删除逗号
- amazon-web-services - 环境变量 ElasticBeanstalk 多容器
- express - 努力正确使用 Axios 异步调用的函数的返回值
- reactjs - 使用 useEffect 和 useState 在查看时增加数字
- python - 从 Pandas 数据类型获取 Python 类型
- iis - 如何使用 IIS 将所有请求重定向到网络服务器上的特定文件夹
- java - 无法在 Kotlin 中使用协程(未解决的参考)