首页 > 解决方案 > 基于比较运算符拆分训练/测试

问题描述

我试图弄清楚如何根据这些条件拆分数据,以便在此运行 CNN:

将训练/测试数据集分成两组:一组标签 < 5,一组标签 >= 5。从训练和测试数据集中打印出结果两组的形状。

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.utils import to_categorical
from tensorflow import keras

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

上面的代码是我加载数据的方式。下面是我如何解释它,但我不确定我是否做对了,因为训练图像的形状仍然为 (50000,32,32,3)。想知道是否有人可以帮助我解决这个问题。

train_labels_first = train_labels[train_labels < 5]
test_labels_first = test_labels[test_labels < 5]


train_labels_second = train_labels[train_labels >= 5]
test_labels_second = test_labels[test_labels >= 5]

标签: pythontensorflowconv-neural-networktrain-test-split

解决方案


只需在您的训练和测试图像上应用布尔索引。例如

train_images_first = train_images[train_labels[train_labels < 5]]
test_images_first = test_images[test_labels[test_labels < 5]]

print(train_images_first.shape, test_images_first.shape)
>>> (25000, 32, 32, 3) (5000, 32, 32, 3)

获取标签只需分配train_labels[train_labels < 5]给一个新变量,该变量将标签保存到值 5。


推荐阅读