首页 > 解决方案 > 如何在 2D 张量中选择恒定的行数(选择优化)

问题描述

给定一个 的二维张Mn*h。有没有一种方法可以选择恒定数量r (r < n)M. 选择取决于 SGD 以最小化预测误差的交叉熵,我将忽略细节。不允许重复行。它必须是r不同的行,不多不少。有没有办法可以做到这一点?

标签: pythontensorflow

解决方案


Here is a kind of proof of concept of how you might try to go about this, although I doubt that you can get anything practically useful out of it. Indexing is not differentiable, so what you might do is treat your tensor "as if" it were a continuous function (or a collection of continuous functions, each in one column), evaluated at integer positions. Then, from differentiation point of view, you could consider linear interpolation between the integer positions to make it "really" continuous, and that would be something you can differentiate. The following script implements this idea:

import tensorflow as tf
import numpy as np

# Problem: find rows with smallest norm
rows, cols = 100, 200
np.random.seed(100)
# Make random sine waves
a = np.random.rand(cols)
f = np.random.rand(cols)
p = 2 * np.pi * np.random.rand(cols)
x = np.linspace(0, 2 * np.pi, rows)[:, np.newaxis]
rand_sin = a * np.sin(f * x + p)
# Solution: row indices sorted by norm
rows_order = np.argsort(np.linalg.norm(rand_sin, axis=-1))
print('Rows ranking:')
print(rows_order)
print()

@tf.custom_gradient
def index_with_gradient(tensor, index):
    # Round index and convert to integer
    index_int = tf.to_int32(tf.round(index))
    # Make sure it does not go out of bounds
    index_int = tf.clip_by_value(index_int, 0, tf.shape(tensor)[0] - 1)
    # These are the selected rows
    out = tf.gather(tensor, index_int)
    # Gradient function
    def grad(dy):
        # Add extra initial and final row so we can take differences
        padded = tf.pad(tensor, [[1, 1], [0, 0]], 'SYMMETRIC')
        # Take differences
        diff = padded[1:] - padded[:-1]
        # Take differences with the previous and with the next
        diff1, diff2 = diff[:-1], diff[1:]
        # Gradients from "left" and "right" for the current indices
        g1 = tf.gather(diff1, index_int)
        g2 = tf.gather(diff2, index_int)
        # Compute error between integer and real-valued index
        index_err = index - tf.cast(index_int, index.dtype)
        # Clip error
        index_err = tf.clip_by_value(index_err, -0.5, 0.5)
        index_err = tf.expand_dims(index_err, -1)
        # Mix left and right gradients according to error
        g = (0.5 - index_err) * g1 + (0.5 + index_err) * g2
        # Aggregate gradients
        g_out = tf.cast(tf.reduce_sum(dy * g, axis=-1), index.dtype)
        # No gradient for tensor, only for index
        return None, g_out
    return out, grad

tf.set_random_seed(100)
# Input tensor
tensor = tf.constant(rand_sin, dtype=tf.float32)
# Index to optimize - must be real so it can be optimized
index = tf.Variable(np.random.randint(rows, size=2), dtype=tf.float32)
# Integer version of the index
index_int = tf.to_int32(tf.round(index))
index_int = tf.clip_by_value(index_int, 0, tf.shape(tensor)[0] - 1)
# Select rows using function with gradient
selected_rows = index_with_gradient(tensor, index)
# Loss value is sum of norms
loss = tf.reduce_sum(tf.norm(selected_rows, axis=-1))
# Pick learning rate for SGD optimizer
learning_rate = 1.0
# Optimization
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
    sess.run(init_op)
    for i in range(1001):
        sess.run(train_op)
        if i % 100 == 0:
            print(f'Iter {i}')
            i1, i2 = sess.run(index_int)
            rank1 = np.where(rows_order == i1)[0][0]
            rank2 = np.where(rows_order == i2)[0][0]
            print(f'- Index 1: {i1} (ranking: {rank1})')
            print(f'- Index 2: {i2} (ranking: {rank2})')
            print()

Output:

Rows ranking:
[ 5  6  4  7  3  8  2  9  1 10  0 11 12 13 14 15 16 17 18 19 20 76 75 77
 74 78 73 79 72 80 71 81 21 70 82 69 83 84 22 68 85 67 86 23 66 87 65 88
 24 89 64 90 25 63 91 92 62 93 26 94 61 95 27 96 60 97 98 99 28 59 58 29
 57 30 56 31 55 32 54 33 53 34 52 35 51 36 50 49 37 48 38 47 39 46 40 45
 41 44 42 43]

Iter 0
- Index 1: 67 (ranking: 41)
- Index 2: 89 (ranking: 49)

Iter 100
- Index 1: 68 (ranking: 39)
- Index 2: 88 (ranking: 47)

Iter 200
- Index 1: 70 (ranking: 33)
- Index 2: 87 (ranking: 45)

Iter 300
- Index 1: 70 (ranking: 33)
- Index 2: 85 (ranking: 40)

Iter 400
- Index 1: 71 (ranking: 30)
- Index 2: 84 (ranking: 37)

Iter 500
- Index 1: 72 (ranking: 28)
- Index 2: 83 (ranking: 36)

Iter 600
- Index 1: 72 (ranking: 28)
- Index 2: 82 (ranking: 34)

Iter 700
- Index 1: 73 (ranking: 26)
- Index 2: 82 (ranking: 34)

Iter 800
- Index 1: 73 (ranking: 26)
- Index 2: 81 (ranking: 31)

Iter 900
- Index 1: 74 (ranking: 24)
- Index 2: 80 (ranking: 29)

Iter 1000
- Index 1: 73 (ranking: 26)
- Index 2: 80 (ranking: 29)

As you can see, the ranking of the indices is actually reduce progressively, although it rather quickly falls into what appears to be a local minimum and does not ever get closer to the actual solution (ranking 0). Also, this solution does not enforce that the selected row indices are actually different. Since it is an optimizer updating the indices, it is not straightforward to put restrictions on it. You might think of something like optimizing an index over the whole tensor, then an index over the whole tensor but one row, etc. although the implementation of the selection and the gradient would probably become significantly more complicated.


推荐阅读