python-3.x - 使用python从html文件中的多个表中获取特定表
问题描述
您好,我之前尝试过问这些问题,但我无法正确解释。我从应用程序获取 html 格式的报告,该文件正在保存在我的本地目录中,并且该 html 文件包含许多表,但我想从文件中提取特定表,因为我发布的 html 文件很大一小部分在片段中以供理解。
<?xml version="1.0" encoding="utf-16"?>
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
<tr>
<td style="border:none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
<tr style="height:70px">
<td style="width: 80%;border: none;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
Backup job: MUMHOILNDDB01 Backup 1
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;"></div>
</td>
<td style="border: none;padding: 0px;font-family: Tahoma;font-size: 12px;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
Error
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;">1
of
1
hosts processed
</div>
</td>
</tr>
<tr>
<td colspan="2" style="border: none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table width="100%" cellspacing="0" cellpadding="0" class="inner" border="0" style="margin: 0px;border-collapse: collapse;">
<tr style="height: 17px;">
<td colspan="9" class="sessionDetails" style="border-style: solid; border-color:#a7a9ac; border-width: 1px 1px 0 1px;height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;"><span>Tuesday, August 4, 2020 11:00:17 AM</span></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="width: 1%;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Success</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:17 AM</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Total size</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Backup size</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td rowspan="3" style="border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;vertical-align: top;"><span class="small_label" style="font-size: 10px;"> </span></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Warning</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Data read</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Dedupe</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Error</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:24</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Compression</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td colspan="9" nowrap="" style="height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;border: 1px solid #a7a9ac;">
Details
</td>
</tr>
<tr class="processObjectsHeader" style="height: 23px">
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Name</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Status</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Size</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Read</b></td>
<td nowrap="" style="width:1%;background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Details</b></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">MUMHOILNDDB01</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span style="color: #FF0000;">Error</span></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:19 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:21</td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span class="small_label" style="font-size: 10px;">Backup job has failed<br />Backup task has been failed<br />Processing finished with errors at 2020-08-04 11:00:42 GMT</span></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td style="border:none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
<tr style="height:70px">
<td style="width: 80%;border: none;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
Backup job: MUMHOISAPHIRE01 Backup 1
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;"></div>
</td>
<td style="border: none;padding: 0px;font-family: Tahoma;font-size: 12px;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
Error
<div class="jobDescription" style="margin-top: 5px;font-size: 12px;">1
of
1
hosts processed
</div>
</td>
</tr>
<tr>
<td colspan="2" style="border: none; padding: 0px;font-family: Tahoma;font-size: 12px;">
<table width="100%" cellspacing="0" cellpadding="0" class="inner" border="0" style="margin: 0px;border-collapse: collapse;">
<tr style="height: 17px;">
<td colspan="9" class="sessionDetails" style="border-style: solid; border-color:#a7a9ac; border-width: 1px 1px 0 1px;height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;"><span>Tuesday, August 4, 2020 10:59:56 AM</span></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="width: 1%;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Success</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">10:59:56 AM</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Total size</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Backup size</b></td>
<td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td rowspan="3" style="border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;vertical-align: top;"><span class="small_label" style="font-size: 10px;"> </span></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Warning</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:20 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Data read</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Dedupe</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Error</b></td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:24</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Compression</b></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
</tr>
<tr style="height: 17px;">
<td colspan="9" nowrap="" style="height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;border: 1px solid #a7a9ac;">
Details
</td>
</tr>
<tr class="processObjectsHeader" style="height: 23px">
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Name</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Status</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Size</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Read</b></td>
<td nowrap="" style="width:1%;background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
<td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
<td style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Details</b></td>
</tr>
<tr style="height: 17px;">
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">MUMHOISAPHIRE01</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span style="color: #FF0000;">Error</span></td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">10:59:58 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:20 AM</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
<td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:22</td>
<td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span class="small_label" style="font-size: 10px;">Backup job has failed<br />Backup task has been failed<br />Processing finished with errors at 2020-08-04 11:00:22 GMT</span></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td> </td>
</tr>
现在,如果您运行该代码段,您可以看到四个表,因为实际上 html 文件由多个表组成,现在有一件事很常见,例如有多个带有标题的表 - 名称、开始时间、结束时间、状态,所以我试图提取所有表带有标题名称、开始时间、结束时间、状态和导出到 csv 的 html 文件,因为 html 文件中有多个带有这些标题的表,当您运行代码片段时,您会明白,我编写了脚本,它给了我在 excel 中的所有内容格式不正确。
import pandas as pd
url = "table1.html"
goal = pd.read_html(url)[0]
goal.to_csv("data.csv")
但无法获得数据,我已经尝试使用带有类的“find_all”属性的“bs4”,但正如您看到的 html 代码有点复杂,所以任何想法都会受到赞赏。
PS:作为编程和这些平台的新手。谢谢您的帮助!
解决方案
df_all=pd.DataFrame()
df=pd.read_html(url)
for df1 in df:
df_all=df_all.append(df1)
df_all=df_all[
df_all[0].str.len()<30
]
df_all=df_all.reset_index(drop=True)
df_all = df_all[
~df_all[0].isin(['Success','Warning','Error','Details'])
]
df_all.columns=df_all.iloc[0]
df_all=df_all[
df_all.Name!='Name'
]
df_all=df_all.reset_index(drop=True)
print(df_all)
4 Name Status Start time End time Size Read Transferred Duration Details
0 MUMHOILNDDB01 Error 11:00:19 AM 11:00:41 AM 0 B 0 B 0 B 0:00:21 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
1 MUMHOISAPHIRE01 Error 10:59:58 AM 11:00:20 AM 0 B 0 B 0 B 0:00:22 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT
2 MUMHOILNDDB01 Error 11:00:19 AM 11:00:41 AM 0 B 0 B 0 B 0:00:21 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
3 MUMHOILNDDB01 Error 11:00:19 AM 11:00:41 AM 0 B 0 B 0 B 0:00:21 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
4 MUMHOISAPHIRE01 Error 10:59:58 AM 11:00:20 AM 0 B 0 B 0 B 0:00:22 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT
5 MUMHOISAPHIRE01 Error 10:59:58 AM 11:00:20 AM 0 B 0 B 0 B 0:00:22 Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT
编辑:“Dedupe”和“Compression”位于第 0 列内的字符串中,在前面的示例中,我使用删除它们<30
如果您想要这些参数,则需要采用不同的方式,并且从这些字符串中也可以获得其他数据。
压缩示例:
df_all=pd.DataFrame()
df=pd.read_html(url)
for df1 in df:
df_all=df_all.append(df1)
df_all=df_all[
df_all[0].str.len()>50
]
df_all['Compression']=df_all[0].str.extract(r'Compression (.*x)')[0]
print(df_all['Compression'].reset_index(drop=True))
#work with regex
0 1.0x
1 1.0x
2 1.0x
3 1.0x
4 1.0x
5 1.0x
推荐阅读
- sql - 无法更新列的格式化值
- html - 输入框搜索输入键
- python - Tkinter:滚动条显示但不起作用
- python-3.x - 在 seaborn catplot 中指定颜色
- c# - 为什么我的 Asp.Net 控制器中的 IFormFile 没有发生模型绑定?
- python - 如何更新复杂 Python 字典中的键名
- machine-learning - 为多输入 Keras 模型设计数据生成器的正确方法是什么?
- database - MongoDB 如何使 2 个嵌套文档具有相同的 _id?
- c - 尝试使用 sendto() 发送消息时出错
- linux - 如何在 Linux 中模拟“内存不足问题”