首页 > 解决方案 > Keras 自定义层和自定义损失函数 - 需要保留状态

问题描述

背景:使用提取方法的文本摘要。

我正在关注的文章 -链接

编辑 1 个指向colab的链接

我网络中的最后一层使用从多个输入中提取特征进行分类。

输入:(?表示批量大小)

  1. d = document_embeddings 形状 = (?, 400)

  2. s = sentence_embeddings shape = (?, 10, 400)
    (解释 - 每个文档 10 个句子)

  3. h_state = h_state 生成 document_embeddings 形状 (?, 10, 400) 的 LSTM (解释 - 10 是 LSTM 中与每个文档中的 10 个句子对应的时间戳,400 是大小)

输出:

  1. 每个句子 1/0 所以形状是 (10,1)

在最后一层,我使用这些输入来计算特征:

C_j = Wc * s_j
M_j = s_j.T * W_s * d
N_j = s_j.T * W_r * tanh(o_j), 
P_j = W_p * h_state 

O_j 是文档的摘要表示。并且是通过将每个 sentence_embeddings 的乘积与它出现在摘要中的概率相加来计算的。

for i in range(j-1):    
    sum += S_i * prob_in_summary(S_i) 

句子 i 的这个 prob_in_summary 由下式计算:

sigmoid(C_i + M_i - N_j + P_j + b)

现在。最小化整个模型的损失函数是观察到的标签的负对数似然(伪代码)

  loss(Wieghts, bias) = 
  for doc.. 
      for sentence.. 
        sent_label * log(prob(sent_label == 1 | S_emb, O_j, D_emb)) + 
       (1-sent_label) * log(1-prob(sent_label==1 | S_emb, O_j, D_emb))

我的问题是:

  1. 我不知道在哪里输入这个损失函数概率计算给定 keras。
  2. 如果我得到的是 sigmoid 的概率,我该如何定义标签?我需要类似“如果概率> 0.7 决定 1 否则 0”之类的东西
  3. 我在哪里计算每个句子的 O_j?我需要在层内保留某种状态..但我得到的层是句子矩阵,而不是一个一个......

到目前为止我的代码:

自定义层:

class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert isinstance(input_shape, list)
        self.W_c = self.add_weight(name='W_c', shape=(1,), initializer='uniform',trainable=True)
        self.W_s = self.add_weight(name='W_s', shape=(1,), initializer='uniform',trainable=True)
        # self.W_r = self.add_weight(name='W_r', shape=(1,), initializer='uniform',trainable=True)
        self.W_p = self.add_weight(name='W_p', shape=(1,), initializer='uniform',trainable=True)
        # self.bias = self.add_weight(name='bias', shape=(1,), initializer='uniform',trainable=True)
        super(MyLayer, self).build(input_shape)  # Be sure to call this at the end


    def call(self, x):
        assert isinstance(x, list)
        document_embedding, sentences_embeddings_stacked, state_h = x

        content_richness = self.W_c * sentences_embeddings_stacked
        print("content_richness", content_richness.shape)

        print("sentences_embeddings_stacked", sentences_embeddings_stacked.shape)
        print("document_embedding", document_embedding.shape)
        print("document_embedding_repeat", K.repeat(document_embedding, 10).shape)
        novelty = sentences_embeddings_stacked * self.W_s # TODO transpose, * K.repeat(document_embedding, 10)
        print("novelty", novelty.shape)

        print("state_h", state_h.shape)
        position = self.W_p * state_h
        print("position", position.shape)

        return content_richness

    def compute_output_shape(self, input_shape):
        assert isinstance(input_shape, list)
        shape_a, shape_b, shape_c = input_shape
        # TODO what to put here? needs to be (?,10,1) or (?, 10) because 1/0 for each sentence in doc and there are 10 sentences
        return [(shape_a[0], self.output_dim), shape_b[:-1]]

自定义损失:

  1. 我需要自定义损失吗?或者在 keras 中观察到的标记是否存在负对数似然?
  2. 给定计算 prob_in_sentence 的函数,我如何计算模型内部的 y_pred(我应该把它放在哪里以及我在哪里以及如何实现 for 循环?

标签: pythontensorflowkerasdeep-learning

解决方案


解决了。必须在我的自定义层中处理批量大小。还有一些堆叠和拆分。

class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.W_p = self.add_weight(name='W_p',
                                   shape=(400,),
                                   initializer='uniform',
                                   trainable=True)
        self.W_c = self.add_weight(name='W_c',
                                   shape=(400,),
                                   initializer='uniform',
                                   trainable=True)

        self.W_s = self.add_weight(name='W_s',
                                   shape=(400,),
                                   initializer='uniform',
                                   trainable=True)

        self.W_r = self.add_weight(name='W_r',
                                   shape=(400,),
                                   initializer='uniform',
                                   trainable=True)
        super(MyLayer, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        def compute_sentence_features(d, sentences_embeddings_stacked, p_j, j, sentences_probs):
            s = sentences_embeddings_stacked[:, j]

            c = s * self.W_c
            m = s * self.W_s * d  # missing transpose

            o = 0
            if j == 0:
                o = sentences_embeddings_stacked[:, 0] * 0.5
            else:
                for i in range(0, j):
                    o += sentences_embeddings_stacked[:, i] * sentences_probs[i]

            n = s * self.W_r * K.tanh(o)  # missing transpose
            p = self.W_p * p_j
            return c, m, n, p, o

        def compute_sentence_prob(features):
            c, m, n, p = features
            sentece_prob = K.sigmoid(c + m - n + p)
            return sentece_prob

        document_embedding, sentences_embeddings_stacked, doc_lstm = x

        O = []
        sentences_probs = []
        for j in range(0, 9):
            c, m, n, p, o = compute_sentence_features(document_embedding, sentences_embeddings_stacked, doc_lstm[:, j], j, sentences_probs)
            print("c,m,n,p,o", c, m, n, p, o)
            sentences_probs.append(compute_sentence_prob((c, m, n, p)))
            O.append(o)

        sentences_probs_stacked = tf.stack(sentences_probs, axis=1)

        dense4output10= Dense(10, input_shape=(400,))(K.sum(sentences_probs_stacked, axis=1))

        output = K.softmax(dense4output10)  # missing bias
        print("output", output)
        return output

    def compute_output_shape(self, input_shape):
        return input_shape[0][0], self.output_dim

推荐阅读