python - 如何将 str 转换为漂亮的汤标签
问题描述
我有两个看起来像这样的html:
<h3>
First heading
</h3>
<ol>
<li>
hi
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second
</li>
</ol>
文件 2
<h3>
First heading
</h3>
<ol>
<li>
hello
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second to second
</li>
</ol>
我需要将第二个文档中的 li 附加到相关 h3 下第一个文档的 html 中。这是我的代码
soup = BeautifulSoup(html_string)
h3_tags = soup.find_all('h3')
ol_tags = [each_h3.find_next('ol') for each_h3 in h3_tags]
soup = BeautifulSoup(html_string_new)
h3_tags_new = soup.find_all('h3')
ol_tags_new = [each_h3.find_next('ol') for each_h3 in h3_tags_new]
countries_old = []
countries_new = []
html_new = ""
for i in h3_tags:
countries_old.append(i.text)
for i in h3_tags_new:
countries_new.append(i.text)
for country in countries_new:
idx = countries_old.index(country)
tag = str(ol_tags[idx])
tag = tag[:-5]
tag = tag[4:]
idx_new = countries_new.index(country)
tag_new = str(ol_tags_new[idx_new])
tag_new = tag_new[:-5]
tag_new = tag_new[4:]
tag = "<ol>" + tag + tag_new + "</ol>"
ol_tags[idx] = tag
html_new += h3_tags[idx]
html_new += tag
with open("check.html", "w", encoding="utf8") as html_file:
html_file.write(html_new)
html_file.close()
import pypandoc
output = pypandoc.convert(source='check.html', format='html', to='docx', outputfile='test.docx', extra_args=["-M2GB", "+RTS", "-K64m", "-RTS"])
此代码从第二个文档中获取 h3 检查其索引,并从同一索引中获取第二个文档的 ol 值。然后它从这些标签中删除 ol 标签并将这两个标签连接在一起。它不断将这些存储在 html_file 中。但是当我将 ol 与 h3 连接时,它会出现此错误:
TypeError: can only concatenate str (not "Tag") to str
编辑:预期输出:
<h3>
First heading
</h3>
<ol>
<li>
hello
</li>
<li>
hi
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second to second
</li>
<li>
second
</li>
</ol>
解决方案
尝试:
from bs4 import BeautifulSoup
html1 = """
<h3>
First heading
</h3>
<ol>
<li>
hi
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second
</li>
</ol>
"""
html2 = """
<h3>
First heading
</h3>
<ol>
<li>
hello
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second to second
</li>
</ol>
"""
soup1 = BeautifulSoup(html1, "html.parser")
soup2 = BeautifulSoup(html2, "html.parser")
for li in soup2.select("h3 + ol > li"):
h3_text = li.find_previous("h3").get_text(strip=True)
h3_soup1 = soup1.find("h3", text=lambda t: h3_text in t)
if not h3_soup1:
continue
h3_soup1.find_next("ol").insert(0, li)
print(soup1.prettify())
印刷:
<h3>
First heading
</h3>
<ol>
<li>
hello
</li>
<li>
hi
</li>
</ol>
<h3>
Second
</h3>
<ol>
<li>
second to second
</li>
<li>
second
</li>
</ol>
推荐阅读
- python - Pyinstaller 可执行文件崩溃了?
- python - 不确定如何将颜色图与 Folium 标记图一起使用
- c++ - 头文件中的内联 std::mutex
- java - 使用多对多关系使用休眠进行低效保存
- c++ - 我在使用 MinGW 和 VS Code 时出现构建错误“g++ notrecognized as a cmdlet ...”
- java - 在 Jersey REST 服务中调用 Google Geocode API 会导致内部 400 错误
- routing - 在 Blazor 中设置启动页面
- flutter - 颤振通知
- video - 文件 concat mp4 使用 ffmpeg 不 concat
- facebook - 有人知道 Facebook 的“Page Insights API”指标吗?