首页 > 解决方案 > 爆炸我的熊猫列不再起作用

问题描述

我需要分解以下数据集中的一些列:

Date    Borr    Year    Acc Grade
2020-06-30  borrower1   2001.0  ['acc1', 'acc2', 'acc3',...]    [3.6, 5.2, 3.8,...]
2020-06-30  borrower1   2001.0  ['acc1', 'acc2', 'acc3',...]    [3.6, 5.2, 3.8,...]
2020-06-30  borrower2   2005.0  ['acc23', 'acc21']  [6.6, 5.2]

我做了如下

new_df=df.apply(pd.Series.explode).reset_index()

但它仍然包含列表。对于每个帐户,我都有一个等级。我一直像上面的例子一样使用爆炸(老实说,它直到几天前才起作用)。有人知道它不再起作用的原因吗?我已经看过其他问题(例如,Pandas explode multiple columns),所以请不要将此问题视为重复问题,因为它不是。

标签: pythonpandas

解决方案


我认为首先有必要将列转换为列表,因为如果不工作explode有字符串:

 import ast
 df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(ast.literal_eval)

或者如果可能 some Nones 是可能的使用:

import json
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(json.loads)

如果某些数据被破坏,可以使用:

def custom_conv(x):
    try:
        return ast.literal_eval(x)
    except:
        return []

import ast
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(custom_conv)

编辑:

对我来说,在 pandas 1.2.3 中工作得很好:

a =  ['[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 2.2, 4.8, 3.1, 2.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]'] 
b = ["['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']"]
df = pd.DataFrame({'Year':[2001,2001,2005, 2006, 2007], 
                   'Acc':a,
                   'Grade':b})

print (df)
   Year                             Acc  \
0  2001       [6.6, 5.2, 4.8, 3.1, 2.9]   
1  2001       [6.6, 5.2, 4.8, 3.1, 2.9]   
2  2005       [6.6, 2.2, 4.8, 3.1, 2.9]   
3  2006  [33.7, 35.0, 31.9, 25.2, 25.9]   
4  2007  [33.7, 35.0, 31.9, 25.2, 25.9]   

                                         Grade  
0     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
1     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
2     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
3  ['acc10', 'acc11', 'acc14', 'acc5', 'acc6']  
4  ['acc10', 'acc11', 'acc14', 'acc5', 'acc6'] 

import ast
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(ast.literal_eval)
 
df = df.apply(pd.Series.explode)
print (df.head(10))
   Year  Acc Grade
0  2001  6.6  acc1
0  2001  5.2  acc2
0  2001  4.8  acc3
0  2001  3.1  acc4
0  2001  2.9  acc5
1  2001  6.6  acc1
1  2001  5.2  acc2
1  2001  4.8  acc3
1  2001  3.1  acc4
1  2001  2.9  acc5

推荐阅读