首页 > 解决方案 > 如何从 Python 中的 JSON 数据中获取部分信息?

问题描述

我有一个 JSON 数据列表,希望从“data-target-user-id”中提取所有 id。如何使用 regex 或 beautifulsoup 做到这一点?

my_lst_JSON = [‘&lt;div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>’,
               ‘&lt;div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>’,
               ‘&lt;div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>’,
               ‘&lt;div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>’,
               ‘&lt;div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>’]

预期产出:

['115389', '1109', '2890', '22567', '33872']

我尝试了@zx485 建议的方法,代码如下:

ids_lst = []

for data in my_lst_JSON:
    soup = BeautifulSoup(data, 'xml')
    ids_lst.append([item.text for item in soup.findAll('div/@user-id')])

它返回一个空白列表...

任何想法表示赞赏!

谢谢你。

标签: jsonpython-3.xregexxmlbeautifulsoup

解决方案


这应该有效

from bs4 import BeautifulSoup as bs
        
my_lst_JSON = ['<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
         '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>',
         '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>',
         '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
         '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>']
        
user_ids = [bs(item).find("div")["user-id"] for item in my_lst_JSON]
print(user_ids)

但我更喜欢@Boomshakalaka 解决方案,它更优雅

import re
my_lst_JSON = ['<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
 '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>',
 '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>',
 '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
 '<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>']

user_ids = [re.findall('\d+|$', item)[0] for item in my_lst_JSON]
print(user_ids)

推荐阅读