首页 > 解决方案 > 有没有一种方法可以在数据框中对列表进行分组?

问题描述

我有一个像这样的数据框:

n° , list_code
1    ["AR13","BD34","TA42","LK87"]
2    ["KA54","OP98"]
1    ["LA14","LK87","AR13"]
3    ["GH53"]
2    ["LO54","LP87"]

我想要一个像这样的输出:

n° ,  list_code 
1     ["AR13","BD34","TA42","LK87","LA14","LK87","AR13"]
2     ["KA54","OP98","LO54","LP87"]
3     ["GH53"]

所以我想按“n°”分组并连接列表,然后我们应该为每一行显示每个代码的出现次数,例如:

n° ,  list_code                                              , output_final
1     ["AR13","BD34","TA42","LK87","LA14","LK87","AR13"]     , {"AR13":2,"BD34":1,"TA42":1,"LK87":2 ..}
2     ["KA54","OP98","LO54","LP87"]                          , {"KA54":1,"OP98":1 ...}
3     ["GH53"]                                                , {"GH53":1}

标签: pythonpython-3.xpandasdataframe

解决方案


只是另一种方式,但不导入更多东西:

df.set_index("n°")\
  .list_code\
  .explode()\
  .groupby(level=0)\
  .agg(lambda x: dict(zip(x, x.value_counts())))

n°
1    {'AR13': 2, 'BD34': 2, 'TA42': 1, 'LK87': 1, '...
2         {'KA54': 1, 'OP98': 1, 'LO54': 1, 'LP87': 1}
3                                          {'GH53': 1}
Name: list_code, dtype: object

免责声明:我很确定这比@jezrael 解决方案慢。


推荐阅读