python - Pandas read_html 无法正确读取文本
问题描述
我有以下文字:
text = """<table class="table table-striped">\n <thead>\n <tr>\n <th data-field="placement">Placement</th>\n <th data-field="production">Production</th>\n <th data-field="application">Eng.Vol.</th>\n <th data-field="body">Body No</th>\n <th data-field="eng">Eng No</th>\n <th data-field="eng">Notes</th>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW18 LHD</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">1.5 L</td>\n <td data-field="body">HRW28</td>\n <td data-field="eng">L15BY</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n <tr>\n <td data-field="placement">Front Stabilizer</td>\n <td data-field="production">Oct 16~</td>\n <td data-field="application">2.0 L</td>\n <td data-field="body">HRW38 RHD</td>\n <td data-field="eng">R20A9</td>\n <td data-field="note" class="">\n Pos:Left/Right </td>\n </tr>\n </thead>\n </table>"""
此 HTML 文本使用 table 标记正确关闭,并具有所有必需的标记。pandas 仍然没有作为表格阅读。
代码:
pd.read_html(text)
输出:
[Empty DataFrame
Columns: [(Placement, Front Stabilizer, Front Stabilizer, Front Stabilizer, Front Stabilizer), (Production, Oct 16~, Oct 16~, Oct 16~, Oct 16~), (Eng.Vol., 1.5 L, 1.5 L, 1.5 L, 2.0 L), (Body No, HRW18, HRW18 LHD, HRW28, HRW38 RHD), (Eng No, L15BY, L15BY, L15BY, R20A9), (Notes, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right, Pos:Left/Right)]
Index: []]```
解决方案
你的桌子被包裹在里面<thead></thead>
。pandas 将所有内容都解释为列是可以理解的。我们试试看:
tmp=pd.read_html(text)[0]
pd.DataFrame(tmp.columns.to_frame().values)
输出:
0 1 2 3 4
-- ---------- ---------------- ---------------- ---------------- ----------------
0 Placement Front Stabilizer Front Stabilizer Front Stabilizer Front Stabilizer
1 Production Oct 16~ Oct 16~ Oct 16~ Oct 16~
2 Eng.Vol. 1.5 L 1.5 L 1.5 L 2.0 L
3 Body No HRW18 HRW18 LHD HRW28 HRW38 RHD
4 Eng No L15BY L15BY L15BY R20A9
5 Notes Pos:Left/Right Pos:Left/Right Pos:Left/Right Pos:Left/Right
推荐阅读
- shell - 与 shell 脚本中元素位置的行差异
- android - 如何在 Android 中的活动之间传递变量?
- xtend - Xtend:将空映射声明为返回语句
- javascript - 使用 VBA 处理 IE 弹出窗口
- c++ - 使用 ACADO 的 RBDL 库问题?
- java - Kotlin Spring bean 验证可空性
- c++ - C++ 将出于什么目的使用非虚拟方法?
- postgresql - PostgreSQL 在不同时区访问相同数据
- .htaccess - 一个条件多个规则 .htaccess
- docker - 如何在 docker-compose up 之后只构建/更新单个 Docker 映像而不是所有服务?