首页 > 解决方案 > 如何筛选特定的

标签为

问题描述

<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>

<h2>Software/OS</h2>
<p>windows xp</p>

<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>

我想从上面创建一个字典,其中键 = 标题标签和值 = 段落标签。

我想要这种格式的输出

{"summary":["这是摘要一。","包含摘要 1 的详细信息。"], "Software/OS": "windows xp", "HARDWARE": ["Intel core i5","8 GB RAM" ]}

谁能帮我这个。提前致谢。

标签: pythonbeautifulsoup

解决方案


您可以使用此脚本创建一个字典,其中键是文本<h2>,值是<p>文本列表:

from bs4 import BeautifulSoup


txt = '''<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>

<h2>Software/OS</h2>
<p>windows xp</p>

<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>'''

soup = BeautifulSoup(txt, 'html.parser')

out = {}
for p in soup.select('p'):
    out.setdefault(p.find_previous('h2').text, []).append(p.text)

print(out)

印刷:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': ['windows xp'], 'HARDWARE': ['Intel core i5', '8 GB RAM']}

如果您不想拥有长度==1 的列表,则可以另外执行:

for k in out:
    if len(out[k]) == 1:
        out[k] = out[k][0]

print(out)

印刷:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': 'windows xp', 'HARDWARE': ['Intel core i5', '8 GB RAM']}

推荐阅读