python - 如何解析 BeautifulSoup 对象中的所有 HTML 标签?
问题描述
我无法解析嵌套的 BeautifulSoup 对象中的 HTML 标记。这里
response = requests.get(
'myurl',
headers={'Authorization': 'Bearer ' + auth_token},
params=params
)
soup = BeautifulSoup(response.content, 'html.parser')
soup = json.loads(str(soup))
all_data.extend(soup['data'])
但是 soup['data'] 是这样的字典列表:
[{"_id":"123","tags":[],"user":{"_id":"u1","name":"ASD Na"},"shared":"<p>Personal: Parents </p><p><br/></p><p>KM: </p><p><br/></p>","private":"","created":"2019-01-26T16:54:56.283Z","district":"543543","creator":{"_id":"c432","name":"Cass Man"},"lastModified":"2019-01-26T16:54:56.284Z"},
{"_id":"234","tags":[],"user":{"_id":"u2","name":"Tyler Dass"},"shared":"Hi,<p>It's great to see your clear.</p>","private":"","created":"2019-11-26T15:48:43.314Z","district":"543543","creator":{"_id":"432","name":"John"},"lastModified":"2019-11-26T15:48:43.315Z"}]
尽管标签只出现在shared
键中,但它们确实出现在多个字段中。如何访问soup
和使用各种 BeautifulSoup 函数来获取所有字段中的所有正确文本?我尝试使用soup.get_text()
,但没有奏效。
解决方案
从我看到的示例中,您收到了 JSON 响应,因此您不需要 BeautifulSoup 来解析它:
response = requests.get('myurl', headers={'Authorization': 'Bearer ' + auth_token}, params=params)
data = response.json() # <-- note the .json() call
all_data.extend(data['data'])
然后,要从shared
key 获取信息,您可以将其转换为 BeautifulSoup 对象:
for d in all_data:
soup = BeautifulSoup(d['shared'], 'html.parser')
# print only text from <p> tags:
print([p.get_text(strip=True) for p in soup.select('p')])
印刷:
['Personal: Parents', '', 'KM:', '']
["It's great to see your clear."]
推荐阅读
- android - Android Studio 保留一个带有片段的数组列表?
- java - 如何在Java中保存图像?
- javascript - 如何在 sequelize 对象上创建一个方法,以检查是否创建了另一个与之关联的对象?
- html - 将项目与底部对齐,同时在 flexbox 中保持拉伸
- mysql - MYSQL 删除连字符然后获取当前日期的前几个字符
- python - python中的for循环以获取查询集中的第n个元素
- r - R HighCharter - 没有要显示的数据
- python-3.x - 值错误():找到样本数量不一致的输入变量:[10540, 42158]
- html - 这个用户代理样式表代码是什么意思?
- php - 我有一个数组,我想将其更改为多维数组。我怎样才能做到这一点?