首页 > 解决方案 > python中的主成分分析降维

问题描述

我必须实现我自己的 PCA 函数函数 Y,V = PCA(data, M, whitening),它计算前 M 个主成分并转换数据,因此 y_n = U^T x_n。该函数应进一步返回 V,它解释了转换所解释的方差量。

我必须将数据 D=4 的维度减少到 M=2 > 给定函数下面 <

def PCA(data,nr_dimensions=None, whitening=False):
""" perform PCA and reduce the dimension of the data (D) to nr_dimensions
Input:
    data... samples, nr_samples x D
    nr_dimensions... dimension after the transformation, scalar
    whitening... False -> standard PCA, True -> PCA with whitening

Returns:
    transformed data... nr_samples x nr_dimensions
    variance_explained... amount of variance explained by the the first nr_dimensions principal components, scalar"""
if nr_dimensions is not None:
    dim = nr_dimensions
else:
    dim = 2

我所做的如下:

import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats

from scipy.stats import multivariate_normal
import pdb

import sklearn
from sklearn import datasets

#covariance matrix
mean_vec = np.mean(data)
cov_mat = (data - mean_vec).T.dot((data - mean_vec)) / (data.shape[0] - 1)
print('Covariance matrix \n%s' % cov_mat)

#now the eigendecomposition of the cov matrix
cov_mat = np.cov(data.T)
    eig_vals, eig_vecs = np.linalg.eig(cov_mat)
    print('Eigenvectors \n%s' % eig_vecs)
    print('\nEigenvalues \n%s' % eig_vals)

# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]

这就是我现在不知道该怎么做以及如何减小尺寸的地方。

欢迎任何帮助!:)

标签: pythonnumpymachine-learningscikit-learnpca

解决方案


PCA 实际上与奇异值分解相同,因此您可以使用numpy.linalg.svd

import numpy as np
def PCA(U,ndim,whitening=False):
    L,G,R=np.linalg.svd(U,full_matrices=False)
    if not whitening:
        L=L @ G
    Y=L[:,:ndim] @ R[:,:ndim].T
    return Y,G[:ndim]

如果要使用特征值问题,则假设样本数高于特征数(或者您的数据会欠拟合),直接计算空间相关性(左特征向量)是低效的。相反,使用 SVD 使用正确的特征函数:

def PCA(U,ndim,whitening=False):
    K=U.T @ U               # Calculating right eigenvectors
    G,R=np.linalg.eigh(K)
    G=G[:,::-1]
    R=R[::-1]
    L=U @ R                 # reconstructing left ones
    nrm=np.linalg.norm(L,axis=0,keepdims=True)  #normalizing them
    L/=nrm
    if not whitening:
        L=L @ G
    Y=L[:,:ndim] @ R[:,:ndim].T
    return Y,G[:ndim]

推荐阅读