首页 > 解决方案 > Python Pandas 在循环中更新相同的命名列并进行了其他计算

问题描述

在数据框中,我想迭代相同的命名列,并在迭代时,当它们的总和超过“val_n”值时。我想要 4 件事:1)exover_when(从“val_n”值超出的迭代次数)2)sum_col(相同命名列的总和)3)在超出时,我想将相应的 col 值替换为(col - (sum_col - val_n) 4) 在超出时点之后,我想将其余的 cols 值替换为 0。

数据框看起来像:

id  col1    col2    col3    col4    col5    col6    col7    col8    col9    col10   col11   col12   col13   col14   val_n
1   350 350 350 350 350 350 350 350 350 350 0   0   0   0   3105.61
2   50  50  55  105 50  0   50  100 50  50  50  50  1025    1066.86 3185.6
3   0   0   0   0   0   3495.1  0   0   0   0   0   0   0   3495.1  3477.76

所需数据框:

id  col1    col2    col3    col4    col5    col6    col7    col8    col9    col10   col11   col12   col13   col14   val_n   exceed_when sum_col
1   350 350 350 350 350 350 350 350 305.61  0   0   0   0   0   3105.61 9   3500
2   50  50  55  105 50  0   50  100 50  50  50  50  1025    1066.86 3185.6      2751.86
3   0   0   0   0   0   3477.76 0   0   0   0   0   0   0   0   3477.76 6   6990.2

这是我尝试过的:

def trans(row):
    row['sum_col'] = 0
    row['exceed_ind'] = 0

    for i in range(1, 15):
        row['sum_col'] += row['col' + str(i)]

        if ((row['exceed_ind'] == 0) &
            (row['sum_col'] >= row['val_n'])):

            row['exceed_ind'] = 1
            row['exceed_when'] = i

        else:
            continue

        if row['exceed_when'] == i:
            row['col' + str(i)] = (
                row['col' + str(i)] - (
                    row['sum_col'] - row['val_n']))

        elif row['exceed_when'] < i:
            row['col' + str(i)] = 0

        else:
            row['col' + str(i)] = row['col' + str(i)]
    return row

df1 = df.apply(trans, axis=1)

我得到 sum_col 的正确结果,超过 when 但条件 elif row['exceed_when'] < i ,似乎没有工作,并且它没有更新预期的第四点,即将 cols 的其余值替换为 0。我不确定我想念什么。

DDL 生成 DataFrame:

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],
                   'col1': [350, 50, 0],
                   'col2': [350, 50, 0],
                   'col3': [350, 55, 0],
                   'col4': [350, 105, 0],
                   'col5' : [350, 50, 0],
                   'col6': [350, 0, 3495.1],
                   'col7': [350, 50, 0],
                   'col8': [350, 100, 0],
                   'col9': [350, 50, 0],
                   'col10': [350, 50, 0],
                   'col11': [0, 50, 0],
                   'col12': [0, 50, 0],
                   'col13': [0, 1025, 0],
                   'col14': [0, 1066.86, 3495.1],
                   'val_n': [3105.61, 3185.6, 3477.76]
                   })

谢谢!

标签: pythonpandas

解决方案


据我所知,该.apply函数只会传递 的副本,row并且所有更新仅发生在副本上,而不是原件DataFrame本身。在这种情况下,您必须遍历行并使用索引更新它们。

df['sum_col'] = 0
df['exceed_ind'] = 0
df['exceed_when'] = 0

for idx, row in df.iterrows():
    sum_col = 0
    exceed_ind = 0
    exceed_when = 0

    for i in range(1, 15):
        sum_col += row['col' + str(i)]

        if ((exceed_ind == 0) &
            (sum_col >= row['val_n'])):

            exceed_ind = 1
            exceed_when = i
            df.loc[idx, 'exceed_ind'] = exceed_ind
            df.loc[idx, 'exceed_when'] = exceed_when
            df.loc[idx, 'col' + str(i)] = (row['col' + str(i)] - (sum_col - row['val_n']))

        elif (exceed_ind==1) & (exceed_when < i):
            df.loc[idx, 'col' + str(i)] = 0

        df.loc[idx, 'sum_col'] = sum_col

print(df)

结果:

    col1  col2  col3  col4  col5     col6  col7  col8    col9  col10  col11  \
id                                                                            
1    350   350   350   350   350   350.00   350   350  305.61      0      0   
2     50    50    55   105    50     0.00    50   100   50.00     50     50   
3      0     0     0     0     0  3477.76     0     0    0.00      0      0   

    col12  col13    col14    val_n  sum_col  exceed_ind  exceed_when  
id                                                                    
1       0      0     0.00  3105.61  3500.00           1            9  
2      50   1025  1066.86  3185.60  2751.86           0            0  
3       0      0     0.00  3477.76  6990.20           1            6  

推荐阅读