首页 > 解决方案 > 将函数数组应用于矩阵列的 Numpy 最快方法

问题描述

我有一个函数数组 shape(n,)和一个 numpy 矩阵 shape (m, n)。现在我想将每个函数应用于矩阵中的相应列,即

matrix[:, i] = funcs[i](matrix[:, i])

我可以使用for循环来执行此操作(请参见下面的示例),但在 numpy 中通常不鼓励使用 for 循环。我的问题是最快(最好是最优雅)的方法是什么?

一个工作示例

import numpy as np

# Example of functions to apply to each row
funcs  = np.array([np.vectorize(lambda x: x+1),
                   np.vectorize(lambda x: x-2),
                   np.vectorize(lambda x: x+3)])
# Initialise dummy matrix
matrix = np.random.rand(50, 3)

# Apply each function to each column
for i in range(funcs.shape[0]):
    matrix[:, i] = funcs[i](matrix[:, i])

标签: pythonnumpy

解决方案


For an array that has many rows and a few columns, a simple column iteration should be time effective:

In [783]: funcs = [lambda x: x+1, lambda x: x+2, lambda x: x+3]
In [784]: arr = np.arange(12).reshape(4,3)
In [785]: for i in range(3):
     ...:     arr[:,i] = funcs[i](arr[:,i])
     ...:     
In [786]: arr
Out[786]: 
array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11],
       [10, 12, 14]])

If the functions work with 1d array inputs, there's not need for np.vectorize (np.vectorize is generally slower than plain iteration anyways.) Also for iteration like this there's no need to wrap the list of functions in an array. It's faster to iterate on lists.

A variation on the indexed iteration:

In [787]: for f, col in zip(funcs, arr.T):
     ...:     col[:] = f(col)
     ...:     
In [788]: arr
Out[788]: 
array([[ 2,  5,  8],
       [ 5,  8, 11],
       [ 8, 11, 14],
       [11, 14, 17]])

I use arr.T here so the iteration is on the columns of arr, not the rows.

A general observation: a few iterations on a complex task is perfectly good numpy style. Many iterations on simple tasks is slow, and should be performed in compiled code where possible.


推荐阅读