python - 从 json-column 创建新列
问题描述
我有一个带有列的数据框:event_name 和 json-objects(不同类型的对象)。我想将此列拆分为新列(如在 json 对象中)。
创建df:
d = [{'event_datetime': '2019-01-08 00:09:30',
'event_json': '{"lvl":"450","tok":"1212","snum":"257","udid":"122112"}',
'event_name': 'AdsClick'},
{'event_datetime': '2019-01-08 00:43:21',
'event_json': '{"lvl":"902","udid":"3123","tok":"4214","snum":"1387"}',
'event_name': 'AdsClick'},
{'event_datetime': '2019-02-08 00:05:01',
'event_json': '{"lvl":"1415","udid":"214124","tok":"213123","snum":"2416","col12":"2416","col13":"2416"}'}]
df12 = json_normalize(d)
样本:
event_datetime event_json event_name
0 2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":... AdsClick
1 2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"... AdsClick
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","... NaN
现在我使用这段代码:
df12 = df12.merge(df12['event_json'].apply(lambda x: pd.Series(json.loads(x))), left_index=True, right_index=True)
结果:
event_datetime event_json event_name lvl snum tok udid col12 col13
0 2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":... AdsClick 450 257 1212 122112 NaN NaN
1 2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"... AdsClick 902 1387 4214 3123 NaN NaN
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","... NaN 1415 2416 213123 214124 2416 2416
但这很慢。您对更快的代码有任何想法吗?
解决方案
将列表推导与DataFrame
构造函数一起使用并添加到原始 by DataFrame.join
:
df = df12.join(pd.DataFrame([json.loads(x) for x in df12['event_json']]))
print (df)
event_datetime event_json \
0 2019-01-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...
1 2019-01-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...
event_name col12 col13 lvl snum tok udid
0 AdsClick NaN NaN 450 257 1212 122112
1 AdsClick NaN NaN 902 1387 4214 3123
2 NaN 2416 2416 1415 2416 213123 214124
如果还需要删除源列,请使用DataFrame.pop
:
df = df12.join(pd.DataFrame([json.loads(x) for x in df12.pop('event_json')]))
print (df)
event_datetime event_name col12 col13 lvl snum tok udid
0 2019-01-08 00:09:30 AdsClick NaN NaN 450 257 1212 122112
1 2019-01-08 00:43:21 AdsClick NaN NaN 902 1387 4214 3123
2 2019-02-08 00:05:01 NaN 2416 2416 1415 2416 213123 214124
推荐阅读
- javascript - 检查并匹配文件扩展名
- flutter - Flutter:如何在不使用标签栏视图的情况下使用导航器创建标签栏?
- gitlab - 在 GitLab 中更改导航主题颜色
- sql - 每天插入数百万条记录的表需要操作或检查?
- java - Eclipse 中带有 SOAP Web 服务的 Drools
- unix - sed 没有改变字符串
- javascript - 转换为 ObjectId 失败:平均堆栈
- python - scrapy错误:读取文件时出错'':未能加载外部实体“”
- javascript - 如何从这两个 observables 中只创建一个订阅?
- python - pandas.read_csv 将十进制零填充浮点列转换为 int