首页 > 解决方案 > pandas:基于另一个数据帧中的映射对特定列进行分组和求和

问题描述

如何使用数据框对另一个数据框的列进行分组?

我有以下数据框:

   |col1|col2|col3|col4|col5|col6|col7
--------------------------------------
0  |1   |10  |10  |100 |100 |10  |100
1  |1   |10  |10  |100 |100 |10  |100

   |name  |group
  ---------------
0  |col1  |group1
1  |col2  |group2
2  |col3  |group2
3  |col4  |group3
4  |col5  |group3
5  |col6  |group2
6  |col7  |group3

我想使用第二个数据帧上的映射对第一个数据帧进行分组和求和。我想获得

   |group1|group2|group3
--------------------------
0  |1     |30    |300
1  |1     |30    |300

col2、col3 和 col6 相加在一起,因为它们映射到 group2。col4、col5 和 col7 相加在一起,因为它们映射到 group3。col1 映射到 group1

要重新创建数据框:

# initial dataframe
df = pd.DataFrame(
    {
        "col1": [1, 1],
        "col2": [10, 10],
        "col3": [10, 10],
        "col4": [100, 100],
        "col5": [100, 100],
        "col6": [10, 10],
        "col7": [100, 100],
    }
)
# dataframe containing all the mappings
mapping = pd.DataFrame(
    {
        "name": ["col1", "col2", "col3", "col4", "col5", "col6", "col7"],
        "group": ["group1", "group2", "group2", "group3", "group3", "group2", "group3"],
    }
)
# dataframe I want to obtain
final = pd.DataFrame(
    {
        "group1": [1, 1],
        "group2": [30, 30],
        "group3": [300, 300],
    }
)

标签: pandaspandas-groupby

解决方案


按列名称使用Index.map,然后聚合sum

s = mapping.set_index('name')['group']

final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
   group1  group2  group3
0       1      30     300
1       1      30     300

或者rename先使用然后聚合:

s = mapping.set_index('name')['group']

final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
   group1  group2  group3
0       1      30     300
1       1      30     300

如果值不存在于 中,则输出不同mappings

# dataframe containing all the mappings (removed last 2 values)
mapping = pd.DataFrame(
    {
        "name": ["col1", "col2", "col3", "col4", "col5"],
        "group": ["group1", "group2", "group2", "group3", "group3"],
    }
)

s = mapping.set_index('name')['group']

#only matched values are processing
final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
   group1  group2  group3
0       1      20     200
1       1      20     200

#not matched columns are added to ouput and not changed
final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
   col6  col7  group1  group2  group3
0    10   100       1      20     200
1    10   100       1      20     200

推荐阅读