首页 > 解决方案 > Python数据框填充不存在

问题描述

我想知道是否有一种有效的方法可以将行添加到 Dataframe,例如包括平均值或预定义值,以防另一列中的特定值没有足够的行。我想问题的描述不是最好的,这就是为什么你在下面找到一个例子:
假设我们有数据框

df1
Client NumberOfProducts ID
A      1                2
A      5                1
B      1                2
B      6                1
C      9                1

我们希望每个客户端 A、B、C、D 有 2 行,无论这 2 行是否已经存在。所以对于客户端 A 和 B,我们可以只复制行,对于 C,我们要添加一行,上面写着 Client = C,NumberOfProducts = 现有行的平均值 = 9 并且 ID 不感兴趣(所以我们可以将其设置为 ID =最小的现有值 - 1 = 0 任何其他值,即使是 NaN,也是可能的)。对于客户端 D,不存在单行,因此我们要添加 2 行,其中 NumberOfProducts 等于常数 2.5。输出应如下所示:

df1
Client NumberOfProducts ID
A      1                2
A      5                1
B      1                2
B      6                1
C      9                1
C      9                0
D      2.5              NaN
D      2.5              NaN

到目前为止,我所做的是遍历数据框并在必要时添加行。由于这是非常低效的,任何更好的解决方案都将受到高度赞赏。

标签: pythonpandasdataframe

解决方案


采用:

clients = ['A','B','C','D']
N = 2

#test only values from list and also filter only 2 rows for each client if necessary
df = df[df['Client'].isin(clients)].groupby('Client').head(N)

#create helper counter and reshape by unstack
df1 = df.set_index(['Client',df.groupby('Client').cumcount()]).unstack()
#set first if only 1 row per client - replace second NumberOfProducts by first 
df1[('NumberOfProducts',1)] = df1[('NumberOfProducts',1)].fillna(df1[('NumberOfProducts',0)])
# ... replace second ID by first subtracted by 1
df1[('ID',1)] = df1[('ID',1)].fillna(df1[('ID',0)] - 1)
#add missing clients by reindex
df1 = df1.reindex(clients)
#replace NumberOfProducts by constant 2.5
df1['NumberOfProducts'] = df1['NumberOfProducts'].fillna(2.5)
print (df1)
       NumberOfProducts        ID     
                      0    1    0    1
Client                                
A                   1.0  5.0  2.0  1.0
B                   1.0  6.0  2.0  1.0
C                   9.0  9.0  1.0  0.0
D                   2.5  2.5  NaN  NaN

#last reshape to original
df2 = df1.stack().reset_index(level=1, drop=True).reset_index()
print (df2)
  Client  NumberOfProducts   ID
0      A               1.0  2.0
1      A               5.0  1.0
2      B               1.0  2.0
3      B               6.0  1.0
4      C               9.0  1.0
5      C               9.0  0.0
6      D               2.5  NaN
7      D               2.5  NaN

推荐阅读