首页 > 解决方案 > 按不同范围的因素加权多列

问题描述

我有一个数据框 df,它有 3 列。A、B 和 C。我想创建一个加权平均列,但要测试不同的权重(权重需要等于 100%)。

所以我可以做到;

weights  =np.arange(0,1,0.05)
if i+j+k=1:

for i in weights:
     for j in weights:
         for k in weights:
outname=str(i)+'A'+str(j)+'B'+str(k)+'C'

df[outname]=df['A'].multiply(k)+df['B'].multiply(i)+df['C'].multiply(j)
else:
    pass

但是,列数可能会更改为更大的数量。因此,此方法将停止工作。

谁能看到这样做的聪明方法?

标签: pythonpandasloops

解决方案


这是你想要的:

from random import randint
import pandas as pd

df = pd.DataFrame([[0,1,2],[3,4,5],[6,7,8]], columns=['A','B','C'])
weightpool = np.arange(0,1,0.05)
weights =  np.linspace(0, 0, num=df.columns.size)


for times in range(1,3):
    #all weights sum up to 1
    while weights.sum()!=1:
        #choose weights out of pool
        for i in range(len(weights)-1):
            weights[i] = weightpool[randint(0, len(weightpool)-1)]

    for i in range(len(weights)-1):
        outname =  outname + str(weights[i]) + df.columns[i]
        outvalue = df[df.columns[i]].multiply(weights[i])
        df[outname] = pd.Series(outvalue, index=df.index)

df

推荐阅读