tensorflow - How should I append an element to each sequence data by tf.data.Dataset
问题描述
I want to get sequence data with char2int['EOS'] added behind by tf.data.Dataset. The codes I wrote are as below:
import tensorflow as tf
def _get_generator(list_of_text, char2int):
def gen():
for text in list_of_text:
yield [char2int[x] for x in text] # transform char to int
return gen
def get_dataset(list_of_text, char2int):
gen = _get_generator(list_of_text, char2int)
dataset = tf.data.Dataset.from_generator(gen, (tf.int32), tf.TensorShape([None]))
dataset = dataset.map(lambda seq: seq+[char2int['EOS']]) # append EOS to the end of line
data_iter = dataset.make_initializable_iterator()
return dataset, data_iter
char2int = {'EOS':1, 'a':2, 'b':3, 'c':4}
list_of_text = ['aaa', 'abc'] # the sequence data
with tf.Graph().as_default():
dataset, data_iter = get_dataset(list_of_text, char2int)
with tf.Session() as sess:
sess.run(data_iter.initializer)
tt1 = sess.run(data_iter.get_next())
tt2 = sess.run(data_iter.get_next())
print(tt1) # got [3 3 3] but I want [2 2 2 1]
print(tt2) # god [3 4 5] but I want [2 3 4 1]
But I can't get what I want. It performs element-wise addition to each data. How should I fix it, thanks
解决方案
In your map function you are adding each value by 1 instead of concatenating the value. You can change your _get_generator
to :
def _get_generator(list_of_text, char2int):
def gen():
for text in list_of_text:
yield [char2int[x] for x in text] + [char2int['EOS']]# transform char to int
return gen
and remove dataset.map
call.
推荐阅读
- node.js - 如何将续集关联名称保持为小写?
- xml - 如何排序和获取最小开始日期(XSLT)
- python-3.x - 算法的效率 - 所有字符串元素都是唯一的
- html - 我的 iframe 视频周围不需要的黑色背景
- angular - Angular 9:错误 NG2003:没有适合类“DataService”的参数“url”的注入令牌。找到字符串
- google-apps-script - 发布 G Suite 插件:部署包含一个空的主机列表
- java - JavaFX stackpane未对齐的孩子
- byte-buddy - 在这种情况下,为什么 ByteBuddy 将方法委托路由到“错误”的方法?
- javascript - 用于 HTML 中动态下拉列表的 JavaScript onchange 事件
- jquery-ui-draggable - 我们如何将一个 ui.draggable 附加到一个目标上,将整个可拖动元素保留在它的位置?