首页 > 解决方案 > 如何从基于两列的熊猫数据框创建新的汇总行

问题描述

我有以下熊猫数据框。

d = {'id1': ['85643', '85644','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00001','G-00002','G-00001','G-00002','G-00001','G-00001','G-00001','G-00001'],'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'Goal': [np.nan, 56,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan ], 'col2': [3, 4,32,43,55,610,0,0,16,23,72,48],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['a1asd', 'a2asd','aabsd','aabsd','a3asd','aabsd','aasd','aabsd','aasd','aasd','aasd','aasd'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}


dff = pd.DataFrame(data=d)
dff
     id1     ID     col1 Goal   col2    col3   Name      Date
0   85643   G-00001 1   NaN     3       1     a1asd     2021-06-13
1   85644   G-00001 2   56.0000 4       22    a2asd     2021-06-13
2   8564312 G-00002 3   NaN     32      33    aabsd     2021-06-13
3   8564314 G-00002 4   89.0000 43      44    aabsd     2021-06-14
4   85645   G-00001 5   73.0000 55      55    a3asd     2021-06-15
5   8564316 G-00002 60  NaN     610     60    aabsd     2021-06-15
6   85646   G-00001 0   NaN     0       1     aasd      2021-06-13
7   8564318 G-00002 0   NaN     0       5     aabsd     2021-06-16
8   85647   G-00001 6   NaN     16      6     aasd      2021-06-13
9   85648   G-00001 3   NaN     23      3     aasd      2021-06-13
10  85649   G-00001 2   34.0000 72      2     aasd      2021-06-13
11  85655   G-00001 4   NaN     48      4     aasd      2021-06-16

我想总结一些列,并根据“id1”列和“Name”列中的一些 id 将它们添加回同一个数据框。另外,当我们添加该行时,我想给“ID”列一个新名称。例如,我有一些“id1”列切片。

基于下面的“id1”列ID,我只想总结“col1”、“col2”、“col3”和“Name”列。然后我想将该行添加回同一个数据框,并为“ID”列提供一个新的 id。

b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']

我想将 col1、col2 和 col3 的总和与平均值相加。因此,当我尝试使用字典理解来做到这一点时,我能够创建一个如下所示的数据框。

创建字典

d_map = {'b65': b65, 'b66': b66, 'b67': b67}
# dictionary comprehension
df = pd.DataFrame({k: dff[dff['id1'].isin(v)].agg({'col1': sum, 'col2': sum,
                                               'col3': 'mean', 'Name': 'unique'})
                   for k,v in d_map.items()}).T.reset_index()
# rename the columns
df = df.rename(columns={'index': 'ID'})
# concat the two frames
pd.concat([dff, df]).reset_index(drop=True)

     id1     ID     col1 Goal   col2    col3   Name                        Date
0   85643   G-00001 1   NaN     3       1     a1asd                     2021-06-13
1   85644   G-00001 2   56.00   4       22    a2asd                     2021-06-13
2   8564312 G-00002 3   NaN     32      33    aabsd                     2021-06-13
3   8564314 G-00002 4   89.00   43      44    aabsd                     2021-06-14
4   85645   G-00001 5   73.00   55      55    a3asd                     2021-06-15
5   8564316 G-00002 60  NaN     610     60    aabsd                     2021-06-15
6   85646   G-00001 0   NaN     0       1     aasd                      2021-06-13
7   8564318 G-00002 0   NaN     0       5     aabsd                     2021-06-16
8   85647   G-00001 6   NaN     16      6     aasd                      2021-06-13
9   85648   G-00001 3   NaN     23      3     aasd                      2021-06-13
10  85649   G-00001 2   34.00   72      2     aasd                      2021-06-13
11  85655   G-00001 4   NaN     48      4     aasd                      2021-06-16
12  NaN     b65     10  NaN    106    15.25 [a1asd, a3asd, aasd]            NaN
13  NaN     b66     19  NaN    173    14.83 [a1asd, a2asd, a3asd, aasd]     NaN
14  NaN     b67     67  NaN    685    35.50 [aabsd]                         NaN  

但是,我想扩展名称列列表并为该名称列表中的每个名称创建新的汇总行。所以我想制作如下所示的数据框。有没有可能做到这一点。

     id1     ID     col1 Goal   col2    col3   Name                        Date
0   85643   G-00001 1   NaN     3       1     a1asd                     2021-06-13
1   85644   G-00001 2   56.00   4       22    a2asd                     2021-06-13
2   8564312 G-00002 3   NaN     32      33    aabsd                     2021-06-13
3   8564314 G-00002 4   89.00   43      44    aabsd                     2021-06-14
4   85645   G-00001 5   73.00   55      55    a3asd                     2021-06-15
5   8564316 G-00002 60  NaN     610     60    aabsd                     2021-06-15
6   85646   G-00001 0   NaN     0       1     aasd                      2021-06-13
7   8564318 G-00002 0   NaN     0       5     aabsd                     2021-06-16
8   85647   G-00001 6   NaN     16      6     aasd                      2021-06-13
9   85648   G-00001 3   NaN     23      3     aasd                      2021-06-13
10  85649   G-00001 2   34.00   72      2     aasd                      2021-06-13
11  85655   G-00001 4   NaN     48      4     aasd                      2021-06-16
12  NaN     b65     1   NaN     3       1     a1asd                         NaN
13  NaN     b65     5   NaN     55      55    a3asd                         NaN
14  NaN     b65     4   NaN     48      2.5   aasd                          NaN
15  NaN     b66     1   NaN     3       1     a1asd                         NaN
15  NaN     b66     2   NaN     4       22    a2asd                         NaN
15  NaN     b66     5   NaN     55      55    a3asd                         NaN
15  NaN     b66     11  NaN    111      3.6   aasd                          NaN
16  NaN     b67     67  NaN    685    35.50   aabsd                         NaN  

Thanks in advance!

标签: pythonpandasdataframenumpydictionary

解决方案


推荐阅读