python - Reading Dataset from files where some might be missing
问题描述
I'm trying to load files to TensorFlow Dataset where some files might be missing (in which case I want to replace these with zeroes).
The structure of directories that I'm trying to read data from is as follows:
|-data
|---sensor_A
|-----1.dat
|-----2.dat
|-----3.dat
|---sensor_B
|-----1.dat
|-----2.dat
|-----3.dat
.dat
files are .csv files with spacebar as a separator. The content of every file is a single, multi-row observation where the number of columns is constant (say 4) and the number of rows is unknown (timeseries data).
I've successfully managed to read every sensor data to a separate TensorFlow Dataset with the following code:
import os
import tensorflow as tf
tf.enable_eager_execution()
data_root_dir = "data"
modalities_to_use = ["sensor_A", "sensor_B"]
timestamps = [1, 2, 3]
for mod_idx, modality in enumerate(modalities_to_use):
# Will produce: ['data/sensor_A/1.dat', 'data/sensor_A/2.dat', 'data/sensor_A/3.dat']
filenames = [os.path.join(data_root_dir, modality, str(timestamp) + ".dat") for timestamp in timestamps]
dataset = tf.data.Dataset.from_tensor_slices((filenames,))
def _parse_function_internal(filename):
number_of_columns = 4
single_observation = tf.read_file(filename)
# Tokenise every value so we can cast these to floats later.
single_observation = tf.string_split([single_observation], sep='\r\n ').values
single_observation = tf.reshape(single_observation, (-1, number_of_columns))
single_observation = tf.strings.to_number(single_observation, tf.float32)
return filename, single_observation
dataset = dataset.map(_parse_function_internal)
print('Result:')
for el in dataset:
try:
# Filename
print(el[0])
# Parsed file content
print(el[1])
except tf.errors.OutOfRangeError:
break
which successfully prints out content of all three files for every sensor.
My problem is that some timestamps in the dataset might be missing. For instance if file 1.dat
in sensor_A
directory will be missing I'm getting this error:
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: mock_data\sensor_A\1.dat : The system cannot find the file specified.
; No such file or directory
[[{{node ReadFile}}]] [Op:IteratorGetNextSync]
which is thrown in this line:
for el in dataset:
What I've tried to do is to surround the call to tf.read_file()
function with try block but obviously it doesn't work as the error is not thrown when tf.read_file()
is called, but when the value is fetched from the dataset. Later I want to pass this dataset to a Keras model so I can't just surround it with try block. Is there any workaround? Is that even supported?
Thanks!
解决方案
我设法解决了这个问题,分享了解决方案,以防万一其他人也遇到了困难。我必须使用额外的布尔值列表来指定文件是否实际存在并将其传递给映射器。然后使用tf.cond()
函数我们决定是读取文件还是用零(或任何其他逻辑)模拟数据。
import os
import tensorflow as tf
tf.enable_eager_execution()
data_root_dir = "data"
modalities_to_use = ["sensor_A", "sensor_B"]
timestamps = [1, 2, 3]
for mod_idx, modality in enumerate(modalities_to_use):
# Will produce: ['data/sensor_A/1.dat', 'data/sensor_A/2.dat', 'data/sensor_A/3.dat']
filenames = [os.path.join(data_root_dir, modality, str(timestamp) + ".dat") for timestamp in timestamps]
files_exist = [os.path.isfile(filename) for filename in filenames]
dataset = tf.data.Dataset.from_tensor_slices((filenames, files_exist))
def _parse_function_internal(filename, file_exist):
number_of_columns = 4
single_observation = tf.cond(file_exist, lambda: tf.read_file(filename), lambda: ' '.join(['0.0'] * number_of_columns))
# Tokenise every value so we can cast these to floats later.
single_observation = tf.string_split([single_observation], sep='\r\n ').values
single_observation = tf.reshape(single_observation, (-1, number_of_columns))
single_observation = tf.strings.to_number(single_observation, tf.float32)
return filename, single_observation
dataset = dataset.map(_parse_function_internal)
print('Result:')
for el in dataset:
try:
# Filename
print(el[0])
# Parsed file content
print(el[1])
except tf.errors.OutOfRangeError:
break
推荐阅读
- c# - C# 中的多重继承与调用基方法
- ios - App Store Connect 操作错误无效的图像路径 - 在键“CFBundleIcons”下引用的路径中找不到图像:“AppIcon20x20”
- c++ - 使用变量传递调用方法并将对象返回给
- c - 如何通过用户输入的数字重复我的用户输入的字符串
- php - 使用 PHP 如何强制下载带有随机文件名和类型的文件
- javascript - Object.keys() 返回 Object 可枚举属性,但遍历 Object.keys() 返回 Object 方法?
- php - 从 mysql 检索的 PHP json 格式表示 Unicode 中的数据
- bash - 努力打印出IP(openvpn)Tun0地址
- validation - 在 MIPS 程序集中验证用户整数输入
- python - 如何按列分组熊猫