首页 > 解决方案 > 熊猫:如何在总和小于相应列的情况下递增地添加一列?

问题描述

我试图将一列增加 1,而该列的总和小于或等于总供应量。我还需要该列小于“分配”列中的相应值。供应变量将根据用户输入在 1-400 之间动态变化。下面是所需的输出(分配最终列)。

供应 = 14

| rank | allocation | Allocation Final |
| ---- | ---------- | ---------------- |
| 1    | 12         | 9                |
| 2    | 3          | 3                |
| 3    | 1          | 1                |
| 4    | 1          | 1                |

以下是我到目前为止的代码:

data = [[1.05493,12],[.94248,3],[.82317,1],[.75317,1]]

df = pd.DataFrame(data,columns=['score','allocation'])

df['rank'] = df['score'].rank()

df['allocation_new'] = 0

#static for testing
supply = 14

for index in df.index:

    while df.loc[index, 'allocation_new'] < df.loc[index, 'allocation'] and df.loc[index, 'allocation_new'].sum() < supply:
        df.loc[index, 'allocation_new'] += 1

print(df)

标签: pythonpandas

解决方案


这应该这样做:

def allocate(df, supply):
    if supply > df['allocation'].sum():
        raise ValueError(f'Unacheivable supply {supply}, maximal {df["allocation"].sum()}')

    under_alloc = pd.Series(True, index=df.index)
    df['allocation final'] = 0

    while (missing := supply - df['allocation final'].sum()) >= 0:
        assert under_alloc.any()

        if missing <= under_alloc.sum():
            df.loc[df.index[under_alloc][:missing], 'allocation final'] += 1
            return df

        df.loc[under_alloc, 'allocation final'] = (
            df.loc[under_alloc, 'allocation final'] + missing // under_alloc.sum()
        ).clip(upper=df.loc[under_alloc, 'allocation'])

        under_alloc = df['allocation final'] < df['allocation']

    return df

在每次迭代中,我们将丢失的配额添加到尚未达到分配的任何行(向下舍入,即missing // under_alloc.sum()),然后使用pd.Series.clip()以确保我们保持低于分配。

如果缺少的配额少于要分配的可用等级(例如,运行相同的数据帧,供应量=5 或 6),我们分配到第一missing等级。

>>> df = pd.DataFrame( {'allocation': {0: 12, 1: 3, 2: 1, 3: 1}, 'rank': {0: 1, 1: 2, 2: 3, 3: 4}})
>>> print(allocate(df, 14))
   allocation  rank  allocation final
0          12     1                 9
1           3     2                 3
2           1     3                 1
3           1     4                 1
>>> print(allocate(df, 5))
   allocation  rank  allocation final
0          12     1                 2
1           3     2                 1
2           1     3                 1
3           1     4                 1

推荐阅读