首页 > 解决方案 > Pandas:基于另一列创建一个新列,该列是一个对象列表

问题描述

我正在尝试在以下数据框(df_new)中创建/添加一个新列:

数据框示例

我希望这个新列 (df['category']) 来自 df['tags']。

标签列,是对象列表,我要检索的值是类别,如果没有类别,我想将其设置为未知。

这是我的 JSON 文件的示例

{"submissionTime":"2019-02-25T09:26:00","b_data":{"bName":"Masato","b_Acc":[{"id":0,"transactions":[{"date":"2019-12-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-12-03","text":"LINE FEE","amount":-460.21,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-12-31","text":"INTEREST","amount":-871.62,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-12-31","text":"LOAN SERVICE FEE","amount":-120,"type":"Loan Related Fees","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-12-18","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-12-02","text":"LINE FEE","amount":-498.34,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-11-29","text":"INTEREST","amount":-794.4,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-11-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-11-01","text":"LINE FEE","amount":-484.87,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-10-31","text":"INTEREST","amount":-882.04,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-10-21","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-10-01","text":"LINE FEE","amount":-503.59,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-09-30","text":"INTEREST","amount":-916.98,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-09-30","text":"LOAN SERVICE FEE","amount":-120,"type":"Loan Related Fees","tags":[{"category":"Fees"},{"creditDebit":"debit"}]},{"date":"2019-09-19","text":"PERIODICAL PAYMENT","amount":3397,"type":"","tags":[{"institution":"University of MC"},{"lenderType":"private"},{"category":"birdy"},{"creditDebit":"credit"}]},{"date":"2019-09-02","text":"LINE FEE","amount":-489.65,"type":"Overdrawn Fees","tags":[{"category":"Overdrawn"},{"creditDebit":"debit"}]},{"date":"2019-08-30","text":"INTEREST","amount":-892.13,"type":"Interest Charge","tags":[{"category":"Fees"},{"creditDebit":"debit"}]}]}]}}

这就是我迄今为止能够做到的:

import json
import numpy as np
import pandas as pd

with open('question.json') as json_data:
    d = json.load(json_data)

df = pd.json_normalize(d['b_data']['b_Acc'])

frames = []

#https://pandas.pydata.org/pandas-docs/stable/merging.html
for index, row in df.iterrows():
    frames = frames + row['transactions']

df_new = pd.DataFrame(frames)

df['category'] = df_new['tags'].apply(pd.Series)[0]

如果类别始终是该数组的第一个元素,这可能会起作用,但是在原始 0 中,第一个元素是机构,第二个原始元素是 creditDebit(我想不知道,因为没有类别)

标签: python-3.xpandasdataframelambda

解决方案


这将执行您在 for 循环中所做的操作

s=pd.DataFrame(pd.DataFrame(df.transactions.tolist()).stack().str['tags'].tolist())

推荐阅读