首页 > 解决方案 > 如何将数组拆分为 28x28 维度 - 输出错误

问题描述

我需要定义一个函数,它接受一个 csv 文件并将其分成 2 个数组:标签和图像(像素值)。我想,我设法将它们分成了这些数组,但是在将像素值分成 28x28 时遇到了问题。我的功能如下所示:

def get_data(filename):
    with open(filename) as training_file:
        training_file = np.loadtxt(training_file, delimiter=",", skiprows=1, dtype='str')
        labels = training_file[:, [0]]
        images = training_file[:, 1:784]
        images = np.array_split(images, 28)
        images = np.asarray(images)
        return images, labels

然后我这样调用这个函数:

path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images, training_labels = get_data(path_sign_mnist_train)
testing_images, testing_labels = get_data(path_sign_mnist_test)

预期的输出是这样的:

print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)

# Their output should be:
# (27455, 28, 28)
# (27455,)
# (7172, 28, 28)
# (7172,)

但相反,我得到了这个:

(28,)
(27455, 1)
(28,)
(7172, 1)

任何建议如何解决它?

标签: pythonnumpy

解决方案


np.array_split(x, 28)会将您的数组拆分为28沿第零轴(行)的子数组。所以如果x是大小(2800, 10),你将得到 28 个大小的数组(100, 10)。你不想要这个!

您似乎实际上想将每一行重塑training_file为一个28x28数组。为此,您可以使用reshape(). 为了演示,我将生成大小可控的示例数组,以便于理解。假设training_file形状为(5, 10),并且您想将 的每一行转换training_file[:, 1:]3x3数组。

training_file = np.arange(50).reshape((5,10))
# this gives:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

labels = training_file[:, 0] # this is irrelevant to the answer
images = training_file[:, 1:] # take all columns from idx 1-end
# this gives:
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [11, 12, 13, 14, 15, 16, 17, 18, 19],
       [21, 22, 23, 24, 25, 26, 27, 28, 29],
       [31, 32, 33, 34, 35, 36, 37, 38, 39],
       [41, 42, 43, 44, 45, 46, 47, 48, 49]])

images_new = images.reshape((-1, 3, 3)) # reshape to shape (x, 3, 3) where x is as much as is required
# this gives:
array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[11, 12, 13],
        [14, 15, 16],
        [17, 18, 19]],

       [[21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]],

       [[31, 32, 33],
        [34, 35, 36],
        [37, 38, 39]],

       [[41, 42, 43],
        [44, 45, 46],
        [47, 48, 49]]])

images_new.shape
# this is:
(5, 3, 3)

所以回答你的问题:而不是

images = training_file[:, 1:784]
images = np.array_split(images, 28)
images = np.asarray(images)

你想做

images = training_file[:, 1:785]
images = images.reshape((-1, 28, 28))

推荐阅读