首页 > 解决方案 > 如何使用 JS 提取 HTML 表格并将其保存为 Json?

问题描述

我正在尝试使用 Selenium IDE 提取表的所有内容,对于小表它工作得很好,但是对于大表它需要很多时间。使用 Selenium IDE,我可以提取表的 HTML 结构并将其作为字符串存储到变量中。我想知道是否可以更快地使用 JavaScript 获取表格的内容并将其保存到 Json 中。

HTML 表将存储到一个变量中:var htmlTable = '<table id="customers"> ... </table>'

<table id="customers">
  <tbody><tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
  <tr>
    <td>Ernst Handel</td>
    <td>Roland Mendel</td>
    <td>Austria</td>
  </tr>
  <tr>
    <td>Island Trading</td>
    <td>Helen Bennett</td>
    <td>UK</td>
  </tr>
  <tr>
    <td>Laughing Bacchus Winecellars</td>
    <td>Yoshi Tannamuri</td>
    <td>Canada</td>
  </tr>
  <tr>
    <td>Magazzini Alimentari Riuniti</td>
    <td>Giovanni Rovelli</td>
    <td>Italy</td>
  </tr>
</tbody></table>

例如像这样保存数据:

[{
        "Company": "Alfreds Futterkiste",
        "Contact": "Maria Anders",
        "Country": "Germany"
    },
    {
        "Company": "Centro comercial Moctezuma",
        "Contact": "Francisco Chang",
        "Country": "Mexico"
    },
    {
        "Company": "Ernst Handel",
        "Contact": "Roland Mendel",
        "Country": "Austria"
    },
    {
        "Company": "Island Trading",
        "Contact": "Helen Bennett",
        "Country": "UK"
    },
    {
        "Company": "Laughing Bacchus Winecellars",
        "Contact ": "Yoshi Tannamuri",
        "Country ": "Canada"
    },
    {
        "Company": "Magazzini Alimentari Riuniti",
        "Contact": "Giovanni Rovelli",
        "Country": "Italy"
    }
]

标签: javascripthtmljson

解决方案


这边走...

const
  cTCols   = document.querySelector('table#customers thead')
, cTdata   = document.querySelector('table#customers tbody')
, colsName = [...cTCols.rows[0].cells].map(c=>c.textContent.trim())
, result   = [...cTdata.rows].reduce((arr,{cells})=>
    {
    let data = [...cells].reduce((o,{textContent: val},i)=>
        (o[colsName[i]]=val.trim(), o), {})
    arr.push(data)
    return arr
    }, [])

console.log(result)
body {
  font-family : Arial, Helvetica, sans-serif;
  font-size   : 14px;
  }
table  {
  border-collapse : collapse;
  margin          : 1em;
  }
td,th  {
  padding : .2em .8em;
  border  : 1px solid darkblue;
  }
thead { 
  background-color : aquamarine;
  }
<table id="customers">
  <thead>
    <tr><th> Company </th><th> Contact </th><th> Country </th></tr>
  </thead>
  <tbody>
    <tr><td> Alfreds Futterkiste          </td><td> Maria Anders     </td><td> Germany </td></tr>
    <tr><td> Centro comercial Moctezuma   </td><td> Francisco Chang  </td><td> Mexico  </td></tr>
    <tr><td> Ernst Handel                 </td><td> Roland Mendel    </td><td> Austria </td></tr>
    <tr><td> Island Trading               </td><td> Helen Bennett    </td><td> UK      </td></tr>
    <tr><td> Laughing Bacchus Winecellars </td><td> Yoshi Tannamuri  </td><td> Canada  </td></tr>
    <tr><td> Magazzini Alimentari Riuniti </td><td> Giovanni Rovelli </td><td> Italy   </td></tr>
  </tbody>
</table>


推荐阅读