python-3.x - Pandas:基于另一列创建一个新列,该列是一个对象列表
问题描述
我正在尝试在以下数据框(df_new)中创建/添加一个新列:
我希望这个新列 (df['category']) 来自 df['tags']。
标签列,是对象列表,我要检索的值是类别,如果没有类别,我想将其设置为未知。
这是我的 JSON 文件的示例
{"submissionTime":"2019-02-25T09:26:00","b_data":{"bName":"Masato","b_Acc":[{"id":0,"transactions":[{"date":"2019-12-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-12-03","text":"LINE FEE","amount":-460.21,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-12-31","text":"INTEREST","amount":-871.62,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-12-31","text":"LOAN SERVICE FEE","amount":-120,"type":"Loan Related Fees","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-12-18","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-12-02","text":"LINE FEE","amount":-498.34,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-11-29","text":"INTEREST","amount":-794.4,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-11-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-11-01","text":"LINE FEE","amount":-484.87,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-10-31","text":"INTEREST","amount":-882.04,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-10-21","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-10-01","text":"LINE FEE","amount":-503.59,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-09-30","text":"INTEREST","amount":-916.98,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-09-30","text":"LOAN SERVICE FEE","amount":-120,"type":"Loan Related Fees","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-09-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-09-02","text":"LINE FEE","amount":-489.65,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-08-30","text":"INTEREST","amount":-892.13,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]}]}]}}
这就是我迄今为止能够做到的:
import json
import numpy as np
import pandas as pd
with open('question.json') as json_data:
d = json.load(json_data)
df = pd.json_normalize(d['b_data']['b_Acc'])
frames = []
#https://pandas.pydata.org/pandas-docs/stable/merging.html
for index, row in df.iterrows():
frames = frames + row['transactions']
df_new = pd.DataFrame(frames)
df['category'] = df_new['tags'].apply(pd.Series)[0]
如果类别始终是该数组的第一个元素,这可能会起作用,但是在原始 0 中,第一个元素是机构,第二个原始元素是 creditDebit(我想不知道,因为没有类别)
解决方案
这将执行您在 for 循环中所做的操作
s=pd.DataFrame(pd.DataFrame(df.transactions.tolist()).stack().str['tags'].tolist())
推荐阅读
- java - 处理 - 椭圆/矩形碰撞的方法
- reactjs - 在 Netlify 上托管时,React-media 查询不适用于页面加载的大屏幕
- chart.js - chartjs中yaxis中的刻度对齐
- mysql - COUNT 未返回预期值
- cucumber - 黄瓜进口
- python - Configparser.py KeyError (Python/Flask)
- java - 获取 null 作为数组的输出
- ipython - 如何将edit++配置为ipython的magicCommand %edit的系统文本编辑器
- c++ - 识别数组中的重复元素?
- r - 如何根据一组不等式约束对 data.table 进行排序?