python - how to parse a json column in a df where we append new column using selected keys
问题描述
Hi I am beginner in python & R. I had a quick question:
#I have a data frame that looks like this:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['BarackObama', {'cap': {'english': 0.014543680863094452, 'universal': 0.005213309669283597},
'categories': {'content': 0.13252377443365895, 'friend': 0.27037007428252813,
'network': 0.07904647486470226, 'sentiment': 0.13142975907620189,
'temporal': 0.0560116435619808, 'user': 0.2120791504162319},
'display_scores': {'content': 0.7, 'english': 1.1, 'friend': 1.4, 'network': 0.4,
'sentiment': 0.7, 'temporal': 0.3, 'universal': 0.6, 'user': 1.1},
'scores': {'english': 0.22180647190550215, 'universal': 0.11116719108518804},
'user': {'id_str': '813286', 'screen_name': 'BarackObama'}}],
['realDonaldTrump', {'cap': {'english': 0.0014187924969112314, 'universal': 0.0018655051726169808},
'categories': {'content': 0.062020196630026815, 'friend': 0.19869669732913162,
'network': 0.05312993020038088, 'sentiment': 0.05985886859558471,
'temporal': 0.07924665710801207, 'user': 0.037517839108884524},
'display_scores': {'content': 0.3, 'english': 0.2, 'friend': 1.0, 'network': 0.3,
'sentiment': 0.3, 'temporal': 0.4, 'universal': 0.2, 'user': 0.2},
'scores': {'english': 0.03265990956683609, 'universal': 0.032398754737074244},
'user': {'id_str': '25073877', 'screen_name': 'realDonaldTrump'}}]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'botScore'])
# print dataframe.
print(df)
# Name botScore
#0 BarackObama {'cap': {'english': 0.014543680863094452, 'uni...
#1 realDonaldTrump {'cap': {'english': 0.0014187924969112314, 'un...
so how can I have something like this where I choose the keys & values from display_score
portion of the json in dataframe and append them to existing data frame?
# data-wrangling part using the display_scores key in json column....
# print(df)
# Name botScore english friend sentiment
#0 BarackObama {'cap':... 1.1 1.4 0.7
#1 realDonaldTrump {'cap':... 0.3 1.0 0.3
I would really appreciate your help in this! I looked at several past posts but I couldn't solve my problem using their approach:
Creating Dataframe with JSON Keys
How to insert specific keys from json file into a data frame in Python
解决方案
首先,修复data
0
将每个列表中的位置名称添加到dict
位置1
- 将 转换
list of lists
为list of dicts
for x in data:
x[1]['name'] = x[0]
data2 = [x[1] for x in data]
处理字典列表
- 使用flatten 包
- 我将在此处仅包含特定功能
def flatten_json(nested_json: dict, exclude: list=[''], sep='_') -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'{name}{a}{sep}')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'{name}{i}{sep}')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
df = pd.DataFrame([flatten_json(x) for x in data2])
print(df)
cap_english cap_universal categories_content categories_friend categories_network categories_sentiment categories_temporal categories_user display_scores_content display_scores_english display_scores_friend display_scores_network display_scores_sentiment display_scores_temporal display_scores_universal display_scores_user scores_english scores_universal user_id_str user_screen_name name
0 0.014544 0.005213 0.132524 0.270370 0.079046 0.131430 0.056012 0.212079 0.7 1.1 1.4 0.4 0.7 0.3 0.6 1.1 0.221806 0.111167 813286 BarackObama BarackObama
1 0.001419 0.001866 0.062020 0.198697 0.053130 0.059859 0.079247 0.037518 0.3 0.2 1.0 0.3 0.3 0.4 0.2 0.2 0.032660 0.032399 25073877 realDonaldTrump realDonaldTrump
其他资源:
推荐阅读
- php - 取消链接 filePath Laravel Php 中的上传文件夹
- javascript - 如何将布尔字符串转换为布尔值?
- typescript - 打字稿:联合类型字段的约束
- r - 如何用 3 列矩阵制作方阵?使用 R
- javascript - 有谁知道如何在 nodejs 中从 X 行到 Y 行读取文件?
- php - 如何将 sql where 子句从 Form 传递到 Php 并在数据库上安全执行?
- python-3.x - 如何从管道分隔的 csv 文件的 gcs 存储桶中获取 Bigquery 中的表?
- javascript - 画布:用内部笔触填充三角形
- node.js - 如何将 FCM 消息发送到某个平台和最后的应用程序参与?
- function - 如何遍历表以调用其中存储的每个函数?(卢阿)