首页 > 解决方案 > 使用python从html文件中的多个表中获取特定表

问题描述

您好,我之前尝试过问这些问题,但我无法正确解释。我从应用程序获取 html 格式的报告,该文件正在保存在我的本地目录中,并且该 html 文件包含许多表,但我想从文件中提取特定表,因为我发布的 html 文件很大一小部分在片段中以供理解。

<?xml version="1.0" encoding="utf-16"?>
<html>
   <head>
      <META http-equiv="Content-Type" content="text/html; charset=utf-8" />
   </head>
   <body>
      <table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
         <tr>
            <td style="border:none; padding: 0px;font-family: Tahoma;font-size: 12px;">
               <table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
                  <tr style="height:70px">
                     <td style="width: 80%;border: none;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
                        Backup job: MUMHOILNDDB01 Backup 1 
                        <div class="jobDescription" style="margin-top: 5px;font-size: 12px;"></div>
                     </td>
                     <td style="border: none;padding: 0px;font-family: Tahoma;font-size: 12px;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
                        Error
                        <div class="jobDescription" style="margin-top: 5px;font-size: 12px;">1
                           of
                           1
                           hosts processed
                        </div>
                     </td>
                  </tr>
                  <tr>
                     <td colspan="2" style="border: none; padding: 0px;font-family: Tahoma;font-size: 12px;">
                        <table width="100%" cellspacing="0" cellpadding="0" class="inner" border="0" style="margin: 0px;border-collapse: collapse;">
                           <tr style="height: 17px;">
                              <td colspan="9" class="sessionDetails" style="border-style: solid; border-color:#a7a9ac; border-width: 1px 1px 0 1px;height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;"><span>Tuesday, August 4, 2020 11:00:17 AM</span></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="width: 1%;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Success</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:17 AM</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Total size</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Backup size</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td rowspan="3" style="border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;vertical-align: top;"><span class="small_label" style="font-size: 10px;"> </span></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Warning</b></td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Data read</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Dedupe</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Error</b></td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:24</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Compression</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
                           </tr>
                           <tr style="height: 17px;">
                              <td colspan="9" nowrap="" style="height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;border: 1px solid #a7a9ac;">
                                 Details
                              </td>
                           </tr>
                           <tr class="processObjectsHeader" style="height: 23px">
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Name</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Status</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Size</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Read</b></td>
                              <td nowrap="" style="width:1%;background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
                              <td style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Details</b></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">MUMHOILNDDB01</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span style="color: #FF0000;">Error</span></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:19 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:41 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:21</td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span class="small_label" style="font-size: 10px;">Backup job has failed<br />Backup task has been failed<br />Processing finished with errors at 2020-08-04 11:00:42 GMT</span></td>
                           </tr>
                        </table>
                     </td>
                  </tr>
               </table>
            </td>
         </tr>
         <tr>
            <td> </td>
         </tr>
         <tr>
            <td style="border:none; padding: 0px;font-family: Tahoma;font-size: 12px;">
               <table cellspacing="0" cellpadding="0" width="100%" border="0" style="border-collapse: collapse;">
                  <tr style="height:70px">
                     <td style="width: 80%;border: none;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
                        Backup job: MUMHOISAPHIRE01 Backup 1 
                        <div class="jobDescription" style="margin-top: 5px;font-size: 12px;"></div>
                     </td>
                     <td style="border: none;padding: 0px;font-family: Tahoma;font-size: 12px;background-color: #fb9895;color: White;font-weight: bold;font-size: 16px;height: 70px;vertical-align: bottom;padding: 0 0 17px 15px;font-family: Tahoma;">
                        Error
                        <div class="jobDescription" style="margin-top: 5px;font-size: 12px;">1
                           of
                           1
                           hosts processed
                        </div>
                     </td>
                  </tr>
                  <tr>
                     <td colspan="2" style="border: none; padding: 0px;font-family: Tahoma;font-size: 12px;">
                        <table width="100%" cellspacing="0" cellpadding="0" class="inner" border="0" style="margin: 0px;border-collapse: collapse;">
                           <tr style="height: 17px;">
                              <td colspan="9" class="sessionDetails" style="border-style: solid; border-color:#a7a9ac; border-width: 1px 1px 0 1px;height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;"><span>Tuesday, August 4, 2020 10:59:56 AM</span></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="width: 1%;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Success</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">10:59:56 AM</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Total size</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Backup size</b></td>
                              <td nowrap="" style="width:85px;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td rowspan="3" style="border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;vertical-align: top;"><span class="small_label" style="font-size: 10px;"> </span></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Warning</b></td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:20 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Data read</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Dedupe</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Error</b></td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:24</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><b>Compression</b></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">1.0x</td>
                           </tr>
                           <tr style="height: 17px;">
                              <td colspan="9" nowrap="" style="height: 35px;background-color: #f3f4f4;font-size: 16px;vertical-align: middle;padding: 5px 0 0 15px;color: #626365; font-family: Tahoma;border: 1px solid #a7a9ac;">
                                 Details
                              </td>
                           </tr>
                           <tr class="processObjectsHeader" style="height: 23px">
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Name</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Status</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Start time</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>End time</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Size</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Read</b></td>
                              <td nowrap="" style="width:1%;background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Transferred</b></td>
                              <td nowrap="" style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Duration</b></td>
                              <td style="background-color: #e3e3e3;padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;border-top: none;font-family: Tahoma;font-size: 12px;"><b>Details</b></td>
                           </tr>
                           <tr style="height: 17px;">
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">MUMHOISAPHIRE01</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span style="color: #FF0000;">Error</span></td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">10:59:58 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">11:00:20 AM</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0 B</td>
                              <td nowrap="" style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;">0:00:22</td>
                              <td style="padding: 2px 3px 2px 3px;vertical-align: top;border: 1px solid #a7a9ac;font-family: Tahoma;font-size: 12px;"><span class="small_label" style="font-size: 10px;">Backup job has failed<br />Backup task has been failed<br />Processing finished with errors at 2020-08-04 11:00:22 GMT</span></td>
                           </tr>
                        </table>
                     </td>
                  </tr>
               </table>
            </td>
         </tr>
         <tr>
            <td> </td>
         </tr>

现在,如果您运行该代码段,您可以看到四个表,因为实际上 html 文件由多个表组成,现在有一件事很常见,例如有多个带有标题的表 - 名称、开始时间、结束时间、状态,所以我试图提取所有表带有标题名称、开始时间、结束时间、状态和导出到 csv 的 html 文件,因为 html 文件中有多个带有这些标题的表,当您运行代码片段时,您会明白,我编写了脚本,它给了我在 excel 中的所有内容格式不正确。

import pandas as pd 
url = "table1.html"
goal = pd.read_html(url)[0]

goal.to_csv("data.csv")

但无法获得数据,我已经尝试使用带有类的“find_all”属性的“bs4”,但正如您看到的 html 代码有点复杂,所以任何想法都会受到赞赏。

PS:作为编程和这些平台的新手。谢谢您的帮助!

标签: python-3.xpandasbeautifulsouphtml-table

解决方案


df_all=pd.DataFrame()
df=pd.read_html(url)
for df1 in df:
    df_all=df_all.append(df1)
df_all=df_all[
    df_all[0].str.len()<30
]
df_all=df_all.reset_index(drop=True)
df_all = df_all[
    ~df_all[0].isin(['Success','Warning','Error','Details'])
]
df_all.columns=df_all.iloc[0]
df_all=df_all[
    df_all.Name!='Name'
]
df_all=df_all.reset_index(drop=True)

print(df_all)
4             Name Status   Start time     End time Size Read Transferred Duration                                                                                                     Details
0    MUMHOILNDDB01  Error  11:00:19 AM  11:00:41 AM  0 B  0 B         0 B  0:00:21  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
1  MUMHOISAPHIRE01  Error  10:59:58 AM  11:00:20 AM  0 B  0 B         0 B  0:00:22  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT
2    MUMHOILNDDB01  Error  11:00:19 AM  11:00:41 AM  0 B  0 B         0 B  0:00:21  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
3    MUMHOILNDDB01  Error  11:00:19 AM  11:00:41 AM  0 B  0 B         0 B  0:00:21  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:42 GMT
4  MUMHOISAPHIRE01  Error  10:59:58 AM  11:00:20 AM  0 B  0 B         0 B  0:00:22  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT
5  MUMHOISAPHIRE01  Error  10:59:58 AM  11:00:20 AM  0 B  0 B         0 B  0:00:22  Backup job has failedBackup task has been failedProcessing finished with errors at 2020-08-04 11:00:22 GMT

编辑:“Dedupe”和“Compression”位于第 0 列内的字符串中,在前面的示例中,我使用删除它们<30

如果您想要这些参数,则需要采用不同的方式,并且从这些字符串中也可以获得其他数据。

压缩示例:

df_all=pd.DataFrame()
df=pd.read_html(url)
for df1 in df:
    df_all=df_all.append(df1)
df_all=df_all[
    df_all[0].str.len()>50
]

df_all['Compression']=df_all[0].str.extract(r'Compression (.*x)')[0]
print(df_all['Compression'].reset_index(drop=True))
#work with regex
0     1.0x
1     1.0x
2     1.0x
3     1.0x
4     1.0x
5     1.0x

推荐阅读