首页 > 解决方案 > How should I append an element to each sequence data by tf.data.Dataset

问题描述

I want to get sequence data with char2int['EOS'] added behind by tf.data.Dataset. The codes I wrote are as below:

import tensorflow as tf 

def _get_generator(list_of_text, char2int):
    def gen():
        for text in list_of_text:
            yield [char2int[x] for x in text] # transform char to int
    return gen

def get_dataset(list_of_text, char2int):
    gen = _get_generator(list_of_text, char2int)
    dataset = tf.data.Dataset.from_generator(gen, (tf.int32), tf.TensorShape([None]))

    dataset = dataset.map(lambda seq: seq+[char2int['EOS']])  # append EOS to the end of line

    data_iter = dataset.make_initializable_iterator()

    return dataset, data_iter

char2int = {'EOS':1, 'a':2, 'b':3, 'c':4}
list_of_text = ['aaa', 'abc'] # the sequence data

with tf.Graph().as_default():
    dataset, data_iter = get_dataset(list_of_text, char2int)
    with tf.Session() as sess:
        sess.run(data_iter.initializer)
        tt1 = sess.run(data_iter.get_next())
        tt2 = sess.run(data_iter.get_next())
        print(tt1)  # got [3 3 3] but I want [2 2 2 1]
        print(tt2)  # god [3 4 5] but I want [2 3 4 1]

But I can't get what I want. It performs element-wise addition to each data. How should I fix it, thanks

标签: tensorflowtensorflow-datasets

解决方案


In your map function you are adding each value by 1 instead of concatenating the value. You can change your _get_generator to :

def _get_generator(list_of_text, char2int):
   def gen():
     for text in list_of_text:
        yield [char2int[x] for x in text] + [char2int['EOS']]# transform char to int
   return gen

and remove dataset.map call.


推荐阅读