首页 > 解决方案 > 将 Embedding 层参数转换为 Lambda 层

问题描述

我正在使用 ELMo 编码和双向 LSTM 重现架构,前两层看起来像这样:

input_layer = Input(shape=(1,), dtype="string", name="Input_layer")
embedding_layer = Lambda(ELMoEmbedding, output_shape=(1024, ), name="Elmo_Embedding")(input_layer)

但是,我不确定如何插入它们而不是我现有的 Keras 嵌入层:

Embedding(len(vocab), embedding_dimension, input_length=maximal_sentence_length)

输入数据在训练之前被标记化,因此它不是 ELMo 实现所需的真正的字符串类型:

def read_dataset(data_file, vocab_to_id, sent_len, debug=False):
    '''
    read training set or test set
    :param data_file:
    :param vocab_to_id:
    :param sent_len: the
    :param debug: load only a small fraction of samples to debug
    :return: model's input and labels
    need about 1min31s for training set and 2min for test set
    '''

    labels, _ = get_label()
    unknown_id = len(vocab_to_id) - 1
    data_x, data_y = list(), list()
    cnt = 0
    
    for sample in tqdm(load_data(data_file)):
        # print(sample)

        # for debugging
        cnt += 1
        if debug and cnt > 100:
            break

        summary = str.lower(sample.summary)
        tokens = nltk.word_tokenize(summary)
        token_ids = [vocab_to_id.get(t, unknown_id) for t in tokens]
        token_ids = pad_sentence(token_ids, sent_len)
        data_x.append(token_ids)
        occupations = sample.occupation

        # train
        if occupations:
            y_vector = [1 if label in occupations else 0 for label in labels]
            data_y.append(y_vector)
        # test
        else:
            data_y.append(0)

return np.array(data_x), np.array(data_y)

标签: kerasnlpelmo

解决方案


推荐阅读