首页 > 解决方案 > Vectorized linear regression

问题描述

Here is my attempt to perform linear regression utilizing just numpy and linear algebra :

def linear_function(w , x , b):
    return np.dot(w , x) + b

x = np.array([[1, 1,1],[0, 0,0]])
y = np.array([0,1])

w = np.random.uniform(-1,1,(1 , 3))

print(w)
learning_rate = .0001

xT = x.T
yT = y.T

for i in range(30000):

    h_of_x = linear_function(w , xT , 1)
    loss = h_of_x - yT

    if i % 10000 == 0:
        print(loss , w)
    w = w + np.multiply(-learning_rate , loss)

linear_function(w , x , 1)

This causes an error :

ValueError                                Traceback (most recent call last)
<ipython-input-137-130a39956c7f> in <module>()
     24     if i % 10000 == 0:
     25         print(loss , w)
---> 26     w = w + np.multiply(-learning_rate , loss)
     27 
     28 linear_function(w , x , 1)

ValueError: operands could not be broadcast together with shapes (1,3) (1,2) 

This appears to work for reduced training set dimensionality :

import numpy as np

def linear_function(w , x , b):
    return np.dot(w , x) + b

x = np.array([[1, 1],[0, 0]])
y = np.array([0,1])

w = np.random.uniform(-1,1,(1 , 2))

print(w)
learning_rate = .0001

xT = x.T
yT = y.T

for i in range(30000):

    h_of_x = linear_function(w , xT , 1)
    loss = h_of_x - yT

    if i % 10000 == 0:
        print(loss , w)
    w = w + np.multiply(-learning_rate , loss)

linear_function(w , x , 1)

print(linear_function(w , x[0] , 1))
print(linear_function(w , x[1] , 1))

Which returns :

[[ 0.68255806 -0.49717912]]
[[ 1.18537894  0.        ]] [[ 0.68255806 -0.49717912]]
[[ 0.43605474  0.        ]] [[-0.06676614 -0.49717912]]
[[ 0.16040755  0.        ]] [[-0.34241333 -0.49717912]]
[ 0.05900769]
[ 1.]

[ 0.05900769] & [ 1.] are close to the training examples so appears this implementation is correct. What is issue with implementation that is throwing error ? I have not implemented the extension of dimensionality from 2 -> 3 correctly ?

标签: pythonnumpylinear-algebralinear-regression

解决方案


I've outlined the issues below:

  1. your array shapes are inconsistent. This could result in issues with broadcasting/dots, especially during gradient descent. Fix your initialisation. I would also recommend augmenting w with b and X with a column of ones.

  2. your loss function and gradient calculation don't seem right to me. In general, using manhattan distance as a loss function is not recommended as it is not a sufficient distance metric. I would go with Euclidean distance and attempt to minimise the sum of squares (this is called OLS regression). We then proceed with the gradient calculation as follows.

  3. your update rule will change accordingly based on (2).

  4. make sure to instate a stopping condition for your code. You don't want to overshoot the optimum. Usually, you should stop when the gradient does not change much.

Full listing:

# input, augmented
x = np.array([[1, 1, 1], [0, 0, 0]])
x = np.column_stack((np.ones(len(x)), x))
# predictions
y = np.array([[0, 1]])   
# weights, augmented with bias
w = np.random.uniform(-1, 1, (1, 4))

learning_rate = .0001

loss_old = np.inf
for i in range(30000):  
    h_of_x = w.dot(x.T)
    loss = ((h_of_x - y) ** 2).sum()

    if abs(loss_old - loss) < 1e-5:
        break

    w = w - learning_rate * (h_of_x - y).dot(x)
    loss_old = loss

Other Recommendations/Enhancements

Next, consider the use of regularisation here. L1 (ridge) and L2 (lasso) are both good alternatives.

Finally, there is a closed form solution for Linear Regression that is guaranteed to converge at a local optimum (gradient descent does only guarantee a local optimum). This is fast, but computationally expensive (since it involves calculating an inverse). See the tradeoffs here.

w = y.dot(np.linalg.inv(x.dot(x.T)).dot(x))

When xT.x is not invertible, you will need to regularise.

Keep in mind that Linear Regression can only model linear decision boundaries. If you're convinced your implementation is correct, and that your loss is still bad, your data may not be fittable in its current vector-space, so you will need non-linear basis function to transform it (this is effectively non-linear regression).


推荐阅读