首页 > 技术文章 > Liner Regression 线性回归及Python代码

PowerTransfer 2018-03-04 09:12 原文

  线性回归是最典型的回归问题,其目标值与所有的特征之间存在线性关系。线性回归于逻辑回归类似,不同的是,逻辑回归在线性回归的基础上加了逻辑函数,从而将线性回归的值从实数域映射到了0-1,通过设定阀值,便实现了回归的0-1分类,即二分类。

  线性回归函数$Y=XW$,其中Y是1*n维向量,X是n*m矩阵,W是m*1的系数矩阵。线性回归采用平方损失函数,至于为什么采用平方损失函数,是因为平方损失函数采用的是最小二乘法的思想,其残差满足正态分布的最大似然估计,详情可百度。

  线性回归损失函数:${{l}_{w}}=\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-X_{i}W \right)}^{2}}}$,其中$X_i$是1*m维向量,W是m*1维向量。

  线性回归的解法有很多种,如直接最小二乘法,梯度下降法和牛顿法等。

1. 最小二乘法

  直接最小二乘法是利用矩阵变换,直接求得系数向量W的矩阵解,过程如下。

  预测函数为:$Y=XW$,其损失函数表示为:${{Y-XW}^{T}}(Y-XW)$  

  对W求导可得:$\frac{d}{dW}{{Y-XW}^{T}}(Y-XW)={{X}^{T}}(Y-XW)$,其求导过程需要矩阵求导的知识。

  另导数为0,得到:$W={{\left( {{X}^{T}}X \right)}^{-1}}{{X}^{T}}Y$

2. 梯度下降法

  线性回归损失函数:${{l}_{w}}=\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-X_{i}W \right)}^{2}}}$,要求损失函数的最小值,对参数W求偏导,得:

\[\frac{\partial {{\text{l}}_{w}}}{\partial W}=\sum\limits_{i=1}^{n}{\left( {{y}_{i}}-{{X}_{i}}W \right)}\cdot {{X}_{i}}\]

  由上式可知,参数W的梯度和逻辑回归中类似,是每个样本的残差值乘以样本对应的值,然后累加起来,第$j$个参数的梯度是对所有样本第$j$特征执行上述操作。

 

  线性回归最小二乘和梯度下降法Python代码如下:

 

# -*- coding: utf-8 -*-
"""
Created on Fri Jan 19 13:29:14 2018

@author: zhang
"""
import numpy as np
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing


"""
多元线性回归需要对各变量进行标准化,因为在求系数wj梯度时,每个样本计算值与其标签值的差要与每个样本对应的第j个属性值相乘,然后求和
因此,如果属性值之间的差异太大,会造成系数无法收敛
"""

#  最小二乘法直接求解权重系数
def least_square(train_x, train_y):
     """
     input:训练数据(样本*属性)和标签
     """  
     weights = (train_x.T * train_x).I * train_x.T * train_y
     return weights


# 梯度下降算法
def gradient_descent(train_x, train_y, maxCycle, alpha):
     
     numSamples, numFeatures = np.shape(train_x)
     weights = np.zeros((numFeatures,1))
     
     for i in range(maxCycle):
          h = train_x * weights
          err = h - train_y           
          weights = weights - (alpha * err.T * train_x).T  
     return weights 

def stochastic_gradient_descent(train_x, train_y, maxCycle, alpha):
     numSamples, numFeatures = np.shape(train_x)
     weights = np.zeros((numFeatures,1))
     
     for i in range(maxCycle):
          for j in range(numSamples):
               h = train_x[j,:] * weights
               err = h - train_y[j,0]           
               weights = weights - (alpha * err.T * train_x[j,:]).T  
               
     return weights


def load_data():
     boston = load_boston()
     data = boston.data
     label = boston.target
     return data, label


def show_results(predict_y, test_y):
     plt.scatter(np.array(test_y), np.array(predict_y), marker='x', s= 30, c='red')   #  画图的数据需要是数组而不能是矩阵
     plt.plot(np.arange(0,50),np.arange(0,50))
     plt.xlabel("original_label")
     plt.ylabel("predict_label")
     plt.title("LinerRegression")
     plt.show()     


if __name__ == "__main__":
     data, label = load_data()
     data = preprocessing.normalize(data.T).T
     
     train_x, test_x, train_y, test_y = train_test_split(data, label, train_size = 0.75, random_state = 33)
     train_x = np.mat(train_x)
     test_x = np.mat(test_x)
     train_y = np.mat(train_y).T   #   (3,)转为矩阵变为行向量了,需要转置
     test_y = np.mat(test_y).T 

#     weights = least_square(train_x, train_y)
#     predict_y = test_x * weights
#     show_results(predict_y, test_y)
#   
     weights = gradient_descent(train_x, train_y, 1000, 0.01)
     predict_y = test_x * weights
     show_results(predict_y, test_y)
     
#     weights = stochastic_gradient_descent(train_x, train_y, 100, 0.01)
#     predict_y = test_x * weights
#     show_results(predict_y, test_y)

 

推荐阅读