首页 > 解决方案 > 根据熊猫索引对numpy数组中的n个值求和

问题描述

我正在尝试计算 numpy 数组中前 n 个值的累积和,其中 n 是 pandas 数据帧每一行中的一个值。我用一列设置了一个小示例问题,它工作正常,但当我有多个列时它不起作用。

失败的示例问题:

a=np.ones((10,))
df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <module>
      2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
      3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
      5 df

C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

C:\ProgramData\Anaconda3\envs\py37\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

~\AppData\Local\Temp/ipykernel_23612/1905114001.py in <lambda>(x)
      2 df=pd.DataFrame([[4.,2],[6.,1],[5.,2.]],columns=['nj','ni'])
      3 df['nj']=df['nj'].astype(int)
----> 4 df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
      5 df

TypeError: slice indices must be integers or None or have an __index__ method

有效的示例问题:

a=np.ones((10,))
df=pd.DataFrame([4.,6.,5.],columns=['nj'])
df['nj']=df['nj'].astype(int)
df['nsum']=df.apply(lambda x: np.sum(a[:x['nj']]),axis=1)
df
    nj  nsum
0   4   4.0
1   6   6.0
2   5   5.0

在这两种情况下:

print(a.shape)
print(a.dtype)
print(type(df))
print(df['nj'].dtype)

(10,)
float64
<class 'pandas.core.frame.DataFrame'>
int32

一个不太令人满意的解决方法是:特别是因为我最终想在 lambda 函数中使用多个列:

tmp=pd.DataFrame(df['nj'])
df['nsum'] = tmp.apply(lambda x: np.sum(delr[:x['nj']]),axis=1)

关于我在这里错过的内容或更好的解决方法的任何澄清?

标签: pandasnumpy

解决方案


IIUC,您可以在 numpy 中使用numpy.takeand 进行操作numpy.cumsum

np.take(np.cumsum(a, axis=0), df['nj'], axis=0)

推荐阅读