首页 > 解决方案 > 如何将(行)numpy.matrix 分配给 numpy.matrix 中的列

问题描述

相关代码如下。我创建了一个零矩阵,这里恰好是一个 2X2 矩阵。然后我遍历一个数据文件,用输入数据集每列范围内的随机数填充矩阵。这可行,除了输出矩阵是转置的,我宁愿做对。请查看代码中的注释。

centroids = mat(zeros((k,n))) #create centroid mat
print('centroids: \n', centroids, '\n', type(centroids))        
print('starting for loop for j in range(%s)' %(n))
    for j in range(n):#create random cluster centers, within bounds of each dimension
        print('\n')
        # get the min value in the jth column
        minJ = min(dataSet[:,j]) 
        # get the max value in the jth column
        maxJ = max(dataSet[:,j]) 
        # the range of values is max - min
        rangeJ = maxJ - minJ
        print('col %s, min = %s, max = %s, range = %s' %(j, minJ, maxJ, rangeJ))
        # create a 'column' of random values for each colum
        col =  mat(minJ + rangeJ * random.rand(1,k))
        print('column %s is %s, type is %s' %(j, col, type(col)))
        # assign columns to column in centroids
        # DOES NOT WORK, assigns to rows.
        centroids[j] = col
        print('   ==> centroids: \n', centroids)
    return centroids

这是输出。注意输出数组 /should/ 是 [[3.08,.434],[-1.36,-.203]]。

centroids:
 [[0. 0.]
 [0. 0.]]
 <class 'numpy.matrix'>
starting for loop for j in range(2)


col 0, min = [[-5.379713]], max = [[4.838138]], range = [[10.217851]]
column 0 is [[ 3.08228829 -1.35924539]], type is <class 'numpy.matrix'>
   ==> centroids:
 [[ 3.08228829 -1.35924539]
 [ 0.          0.        ]]


col 1, min = [[-4.232586]], max = [[5.1904]], range = [[9.422986]]
column 1 is [[ 0.4342251 -0.2026065]], type is <class 'numpy.matrix'>
   ==> centroids:
 [[ 3.08228829 -1.35924539]
 [ 0.4342251  -0.2026065 ]]
================

centroids follows:
[[3.08228829]
 [0.4342251 ]]
[[-1.35924539]
 [-0.2026065 ]]

这是我尝试过的:

centroids[:,j] = col
centroids[0:1,j] = col

这是错误消息:

Traceback (most recent call last):
  File "run.py", line 68, in <module>
    centroids = randCent(dataList, 2)
  File "run.py", line 51, in randCent
    centroids[0:1,j] = col
ValueError: could not broadcast input array from shape (1,2) into shape (1,1)

在不转置矩阵的情况下如何做到这一点?谢谢。

我的脚本文件如下:

 #run.py

from numpy import *
import sys
import importlib

#fun defs############################################

def testFun(name):
    print("Hello, %s" %(name))


def getData(fileName):
    data = []
    fr = open(fileName)
    for line in fr.readlines():
        curLine = line.strip().split('\t')
        #print('A ',curLine, type(curLine)) ## gets a list of strings as a list
        fltLine = [float(i) for i in curLine] ## converts strings to floats
        #print('B ', fltLine, type(fltLine)) ## returns a list of floats
        data.append(fltLine)
    return data


def distEuclid(vecA, vecB):
    return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)


dataSet = [[3.141592653589793, 1.4142135623730951], [2.718281828459045, 1.618033988749895]]

def randCent(dataSet, k):
    print('calling randCent(dataset, %s)' %(k)) 
    dataSet = mat(dataSet)
    n = shape(dataSet)[1]
    print('columns n is %s and groups k is %s and type(dataSet) is %s' %(n, k, type(dataSet)))
    #print(dataSet)
    centroids = mat(zeros((k,n))) #create centroid mat
    print('centroids: \n', centroids, '\n', type(centroids))
    print('starting for loop for j in range(%s)' %(n))
    for j in range(n):#create random cluster centers, within bounds of each dimension
        print('\n')
        # get the min value in the jth column
        minJ = min(dataSet[:,j]) 
        # get the max value in the jth column
        maxJ = max(dataSet[:,j]) 
        # the range of values is max - min
        rangeJ = maxJ - minJ
        print('col %s, min = %s, max = %s, range = %s' %(j, minJ, maxJ, rangeJ))
        # create a 'column' of random values for each colum
        col =  mat(minJ + rangeJ * random.rand(1,k))
        print('column %s is %s, type is %s' %(j, col, type(col)))
        # assign columns to column in centroids
        # DOES NOT WORK, assigns to rows.
        centroids[0:1,j] = col
        print('   ==> centroids: \n', centroids)
#    print('==> centroids: ', centroids)
    return centroids
    



#exe code#############################################
print("loading file run.py")
testFun('Bob')

dataList = None
dataList = getData('testSet.txt')
#print(dataList, type(dataList))

print('variable dataList has been initialized: %s' %(dataList is not None))
centroids = randCent(dataList, 2)
print('================\n')

print('centroids follows:')
print(centroids[:,0])
print(centroids[:,1])

标签: pythonnumpymatrix

解决方案


分配给二维数组:

In [500]: A = np.zeros((2,3), int)                                                                   
In [501]: A[0,:] = np.arange(3)                                                                      
In [502]: A[:,1] = [10,20]                                                                           
In [503]: A                                                                                          
Out[503]: 
array([[ 0, 10,  2],
       [ 0, 20,  0]])
In [504]: A = np.zeros((2,3), int)                                                                   
In [505]: A[0,:] = [1,2,3]                                                                           
In [506]: A[:,1] = [10,20]                                                                           
In [507]: A                                                                                          
Out[507]: 
array([[ 1, 10,  3],
       [ 0, 20,  0]])

尝试相同的np.matrix

In [512]: M = np.matrix(np.zeros((2,3),int))                                                         
In [513]: M                                                                                          
Out[513]: 
matrix([[0, 0, 0],
        [0, 0, 0]])
In [514]: M[0,:] = [1,2,3]                                                                           
In [515]: M[:,1] = [10,20]                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-515-e95a3ab21d7f> in <module>
----> 1 M[:,1] = [10,20]

ValueError: could not broadcast input array from shape (2) into shape (2,1)
In [516]: M[:,1] = [[10],[20]]                                                                       
In [517]: M                                                                                          
Out[517]: 
matrix([[ 1, 10,  3],
        [ 0, 20,  0]])

为什么有区别?因为一旦一个矩阵,总是一个矩阵:

In [518]: A[:,1]                                                                                     
Out[518]: array([10, 20])
In [519]: M[:,1]                                                                                     
Out[519]: 
matrix([[10],
        [20]])

要将值分配给 (2,1) 形状的空间,您需要一个 (2,1) 值。广播只能将前导维度 (2,) 添加到 (1,2),而不是 (2,1)。

flatiter 可用于分配一维数组:

In [520]: M[:,1].flat                                                                                
Out[520]: <numpy.flatiter at 0x7f57b127dda0>
In [521]: M[:,1].flat = [100,200]                                                                    
In [522]: M                                                                                          
Out[522]: 
matrix([[  1, 100,   3],
        [  0, 200,   0]])                          

推荐阅读