首页 > 解决方案 > Assign class label based on parent folder name when using Tf.data

问题描述

I have prepared a dataset the way I used to when using Datagenerator.flow_from_directory from Keras. So basically, I have three folders "Train" "Valid" and "Test" and inside each one, I have folders named after the class they represent. However, instead of images, I have saved my data in those subfolders as compressed numpy files .npz. I found that it is possible to create an input data pipeline to read .npz files using Tf.data, however, the example in the documentation only shows how load a dataset that has labels saved in the .npz files as follows:

with np.load(path) as data:
  train_examples = data['x_train']
  train_labels = data['y_train']
  test_examples = data['x_test']
  test_labels = data['y_test']

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))

And there is no explanation of how to generate a dataset and automatically assign labels to the data based on its parent folder (the way it is done in flow_from_directory).

Is there a way to achieve that or should I manually import data from each folder and assign a hot-encoded label to each subset? Thank youu !!

标签: pythontensorflowtf.data.dataset

解决方案


推荐阅读