首页 > 解决方案 > 在新列的单元格级别计算数据框列表中的连续元素

问题描述

我有以下df:

df6 = pd.DataFrame({'name':['Sara',  'John', 'Jack'],
                   'places': ['UK,UK,UK,UK,US,CA', 'US,US,US,CA,CA,CA', 'Mexico,AUS,AUS,Mexico,Mexico']
                   })

df6

好像:

    name    places
0   Sara    UK,UK,UK,UK,US,CA
1   John    US,US,US,CA,CA,CA
2   Jack    Mexico,AUS,AUS,Mexico,Mexico

地点列仅关注 5 个国家/地区。我想做的是找到每个国家的连续访问次数。所以基本上输出会是这样的:

    name    UK   US   CA   Mexico   AUS    
0   Sara    4    0    0       0      0
1   John    0    3    3       0      0  
2   Jack    0    0    0       2      2

到目前为止我所做的步骤是:

df6['consecutive'] = df6.places.map(lambda x: [Counter(group[1]) for group in groupby(x.split(','))])

这给了我一个list of dicts

    name    places                        consecutive
0   Sara    UK,UK,UK,UK,US,CA             [{'UK': 4}, {'US': 1}, {'CA': 1}]
1   John    US,US,US,CA,CA,CA             [{'US': 3}, {'CA': 3}]
2   Jack    Mexico,AUS,AUS,Mexico,Mexico  [{'Mexico': 1}, {'AUS': 2}, {'Mexico': 2}]

现在我在这里坚持如何迭代连续列中的每个单元格以查找values > 1每个单元格并将 df6 重塑为最终输出:

    name    UK   US   CA   Mexico   AUS    
0   Sara    4    0    0       0      0
1   John    0    3    3       0      0  
2   Jack    0    0    0       2      2

标签: python-3.xpandaslistdataframe

解决方案


您可以使用pd.crosstab

df6["places"] = df6["places"].apply(lambda x: x.split(","))
df6 = df6.explode("places")

out = pd.crosstab(df6["name"], df6["places"])
out.index.name = None
out.columns.name = None
print(out)

印刷:

      AUS  CA  Mexico  UK  US
Jack    2   0       3   0   0
John    0   3       0   0   3
Sara    0   1       0   4   1

编辑:求和consecutive列(对于连续值> 1):

from itertools import groupby
from collections import Counter

df6["consecutive"] = df6.places.map(
    lambda x: [
        {k: v for k, v in Counter(group[1]).items() if v > 1}
        for group in groupby(x.split(","))
    ]
)

df6 = df6.explode("consecutive").reset_index(drop=True)
out = (
    pd.concat([df6, pd.DataFrame(df6.pop("consecutive").tolist())], axis=1)
    .groupby("name")
    .sum()
)
print(out)

印刷:

       UK   US   CA  AUS  Mexico
name                            
Jack  0.0  0.0  0.0  2.0     2.0
John  0.0  3.0  3.0  0.0     0.0
Sara  4.0  0.0  0.0  0.0     0.0

推荐阅读