首页 > 解决方案 > Scipy 最小化说它是成功的,然后继续警告

问题描述

我正在尝试最小化一个功能。我正在显示 scipy 在运行时取得的进展。显示的第一条消息是 。. .

Optimization terminated successfully.
         Current function value: 0.000113
         Iterations: 32
         Function evaluations: 13299
         Gradient evaluations: 33

这看起来很有希望。问题是进程没有终止。事实上,它会继续发送消息,例如

Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.023312
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.068360
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.071812
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.050061
         Iterations: 50
         Function evaluations: 20553
         Gradient evaluations: 51

下面是内部调用最小化的代码:

def one_vs_all(X, y, num_labels, lmbda):
  
  # store dimensions of X that will be reused
  m = X.shape[0]
  n = X.shape[1]

  # append ones vector to X matrix
  X = np.column_stack((np.ones((X.shape[0], 1)),X))

  # create vector in which thetas will be returned
  all_theta = np.zeros((num_labels, n+1))
  
  # choose initial thetas
  #init_theta = np.zeros((n+1, 1))

  for i in np.arange(num_labels):
    # note theta should be first arg in objective func signature followed by X and y
    init_theta = np.zeros((n+1,1))
    theta = minimize(lrCostFunctionReg, x0=init_theta, args=(X, (y == i)*1, lmbda),
                      options={'disp':True, 'maxiter':50})
    all_theta[i] = theta.x
  return all_theta

我尝试过更改最小化方法,将迭代次数从低至 30 次更改为高达 1000 次。我还尝试提供自己的梯度函数。在所有情况下,例程最终都会提供答案,但它是完全错误的。有谁知道发生了什么?

编辑:函数是可微的。这是成本函数,然后是它的梯度(未正则化,然后正则化)。

def lrCostFunctionReg(theta, X, y, lmbda):
  
  m = X.shape[0]

  # unregularized cost
  h = sigmoid(X @ theta)

  # calculate regularization term
  reg_term = ((lmbda / (2*m)) * (theta[1:,].T @ theta[1:,]))
  
  cost_reg = (1/m) * (-(y.T @ np.log(h)) - ((1 - y).T @ np.log(1 - h))) + reg_term

  return cost_reg

def gradFunction(theta, X, y):
  m = X.shape[0]

  theta = np.reshape(theta,(theta.size,1))
  
  # hypothesis as generated in cost function
  h = sigmoid(X@theta)

  # unregularized gradient
  grad = (1/m) * np.dot(X.T, (h-y))

  return grad

def lrGradFunctionReg(theta, X, y, lmbda):
  
  m = X.shape[0]

  # theta reshaped to ensure proper operation
  theta = np.reshape(theta,(theta.size,1))

  # generate unregularized gradient
  grad = gradFunction(theta, X, y)
  
  # calc regularized gradient w/o touching intercept; essential that only 1 index used
  grad[1:,] = ((lmbda / m) * theta[1:,]) + grad[1:,]

  return grad.flatten()

标签: pythonnumpymachine-learningscipyscipy-optimize

解决方案


为了回答我自己的问题,这个问题原来是矢量形状之一。我喜欢在 2D 中编码,但 SciPy 优化例程仅适用于已“展平”为数组的列向量和行向量。多维矩阵很好,但列向量和行向量太远了。

例如,如果 y 是标签向量并且 y.shape 是 (400,1),则需要在 y 上使用 y.flatten(),这将使 y.shape = (400,)。然后,假设所有其他维度都有意义,SciPy 将使用您的数据。

因此,如果您将 MATLAB 机器学习代码转换为 Python 的工作停滞不前,请检查以确保您已将行向量和列向量展平,尤其是那些由梯度函数返回的向量。


推荐阅读