首页 > 解决方案 > 如何将 JSON SList 转换为 pandas 数据框?

问题描述

a = ['{"type": "book",', 
     '"title": "sometitle",', 
     '"author": [{"name": "somename"}],', 
     '"year": "2000",', 
     '"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
     '"publisher": "somepublisher"}', '',
     '{"type": "book",', '
     '"title": "sometitle2",', 
     '"author": [{"name": "somename2"}],', 
     '"year": "2001",', 
     '"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
     '"publisher": "somepublisher"}', '']

我有这个令人费解的 SList,我想最终把它变成一个整洁的 pandas 数据框。

我尝试了很多事情,例如:

i = iter(a)
b = dict(zip(i, i))

不幸的是,这会创建一个看起来更糟糕的字典:

{'{"type": "book",':
...

在我有一个 SList 词典的地方,我现在有一个词典词典。

我也试过

pd.json_normalize(a)

但这会引发错误消息AttributeError: 'str' object has no attribute 'values'

我也试过

r = json.dumps(a.l)
loaded_r = json.loads(r)
print(loaded_r)

但这会产生一个列表

['{"type": "book",',
...

同样,最后我想要一个像这样的熊猫数据框

type   title       author     year ...

book   sometitle   somename   2000 ...
book   sometitle2 somename2   2001

显然,我还没有真正达到可以将数据提供给 pandas 函数的地步。每次我这样做时,功能都会对我尖叫......

标签: pythonpandasjupyter-notebookipython

解决方案


a = ['{"type": "book",', 
     '"title": "sometitle",', 
     '"author": [{"name": "somename"}],', 
     '"year": "2000",', 
     '"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
     '"publisher": "somepublisher"}', '',
     '{"type": "book",', 
     '"title": "sometitle2",', 
     '"author": [{"name": "somename2"}],', 
     '"year": "2001",', 
     '"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
     '"publisher": "somepublisher"}', '']

b = "[%s]" % ''.join([',' if i == '' else i for i in a ]).strip(',')
data = json.loads(b)
df = pd.DataFrame(data)

print(df)

   type       title                   author  year  \
0  book   sometitle   [{'name': 'somename'}]  2000   
1  book  sometitle2  [{'name': 'somename2'}]  2001   

                               identifier      publisher  
0  [{'type': 'ISBN', 'id': '1234567890'}]  somepublisher  
1  [{'type': 'ISBN', 'id': '1234567890'}]  somepublisher

推荐阅读