python - 如何在 2D 张量中选择恒定的行数(选择优化)
问题描述
给定一个 的二维张M
量n*h
。有没有一种方法可以选择恒定数量r (r < n)
的M
. 选择取决于 SGD 以最小化预测误差的交叉熵,我将忽略细节。不允许重复行。它必须是r
不同的行,不多不少。有没有办法可以做到这一点?
解决方案
Here is a kind of proof of concept of how you might try to go about this, although I doubt that you can get anything practically useful out of it. Indexing is not differentiable, so what you might do is treat your tensor "as if" it were a continuous function (or a collection of continuous functions, each in one column), evaluated at integer positions. Then, from differentiation point of view, you could consider linear interpolation between the integer positions to make it "really" continuous, and that would be something you can differentiate. The following script implements this idea:
import tensorflow as tf
import numpy as np
# Problem: find rows with smallest norm
rows, cols = 100, 200
np.random.seed(100)
# Make random sine waves
a = np.random.rand(cols)
f = np.random.rand(cols)
p = 2 * np.pi * np.random.rand(cols)
x = np.linspace(0, 2 * np.pi, rows)[:, np.newaxis]
rand_sin = a * np.sin(f * x + p)
# Solution: row indices sorted by norm
rows_order = np.argsort(np.linalg.norm(rand_sin, axis=-1))
print('Rows ranking:')
print(rows_order)
print()
@tf.custom_gradient
def index_with_gradient(tensor, index):
# Round index and convert to integer
index_int = tf.to_int32(tf.round(index))
# Make sure it does not go out of bounds
index_int = tf.clip_by_value(index_int, 0, tf.shape(tensor)[0] - 1)
# These are the selected rows
out = tf.gather(tensor, index_int)
# Gradient function
def grad(dy):
# Add extra initial and final row so we can take differences
padded = tf.pad(tensor, [[1, 1], [0, 0]], 'SYMMETRIC')
# Take differences
diff = padded[1:] - padded[:-1]
# Take differences with the previous and with the next
diff1, diff2 = diff[:-1], diff[1:]
# Gradients from "left" and "right" for the current indices
g1 = tf.gather(diff1, index_int)
g2 = tf.gather(diff2, index_int)
# Compute error between integer and real-valued index
index_err = index - tf.cast(index_int, index.dtype)
# Clip error
index_err = tf.clip_by_value(index_err, -0.5, 0.5)
index_err = tf.expand_dims(index_err, -1)
# Mix left and right gradients according to error
g = (0.5 - index_err) * g1 + (0.5 + index_err) * g2
# Aggregate gradients
g_out = tf.cast(tf.reduce_sum(dy * g, axis=-1), index.dtype)
# No gradient for tensor, only for index
return None, g_out
return out, grad
tf.set_random_seed(100)
# Input tensor
tensor = tf.constant(rand_sin, dtype=tf.float32)
# Index to optimize - must be real so it can be optimized
index = tf.Variable(np.random.randint(rows, size=2), dtype=tf.float32)
# Integer version of the index
index_int = tf.to_int32(tf.round(index))
index_int = tf.clip_by_value(index_int, 0, tf.shape(tensor)[0] - 1)
# Select rows using function with gradient
selected_rows = index_with_gradient(tensor, index)
# Loss value is sum of norms
loss = tf.reduce_sum(tf.norm(selected_rows, axis=-1))
# Pick learning rate for SGD optimizer
learning_rate = 1.0
# Optimization
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
sess.run(init_op)
for i in range(1001):
sess.run(train_op)
if i % 100 == 0:
print(f'Iter {i}')
i1, i2 = sess.run(index_int)
rank1 = np.where(rows_order == i1)[0][0]
rank2 = np.where(rows_order == i2)[0][0]
print(f'- Index 1: {i1} (ranking: {rank1})')
print(f'- Index 2: {i2} (ranking: {rank2})')
print()
Output:
Rows ranking:
[ 5 6 4 7 3 8 2 9 1 10 0 11 12 13 14 15 16 17 18 19 20 76 75 77
74 78 73 79 72 80 71 81 21 70 82 69 83 84 22 68 85 67 86 23 66 87 65 88
24 89 64 90 25 63 91 92 62 93 26 94 61 95 27 96 60 97 98 99 28 59 58 29
57 30 56 31 55 32 54 33 53 34 52 35 51 36 50 49 37 48 38 47 39 46 40 45
41 44 42 43]
Iter 0
- Index 1: 67 (ranking: 41)
- Index 2: 89 (ranking: 49)
Iter 100
- Index 1: 68 (ranking: 39)
- Index 2: 88 (ranking: 47)
Iter 200
- Index 1: 70 (ranking: 33)
- Index 2: 87 (ranking: 45)
Iter 300
- Index 1: 70 (ranking: 33)
- Index 2: 85 (ranking: 40)
Iter 400
- Index 1: 71 (ranking: 30)
- Index 2: 84 (ranking: 37)
Iter 500
- Index 1: 72 (ranking: 28)
- Index 2: 83 (ranking: 36)
Iter 600
- Index 1: 72 (ranking: 28)
- Index 2: 82 (ranking: 34)
Iter 700
- Index 1: 73 (ranking: 26)
- Index 2: 82 (ranking: 34)
Iter 800
- Index 1: 73 (ranking: 26)
- Index 2: 81 (ranking: 31)
Iter 900
- Index 1: 74 (ranking: 24)
- Index 2: 80 (ranking: 29)
Iter 1000
- Index 1: 73 (ranking: 26)
- Index 2: 80 (ranking: 29)
As you can see, the ranking of the indices is actually reduce progressively, although it rather quickly falls into what appears to be a local minimum and does not ever get closer to the actual solution (ranking 0). Also, this solution does not enforce that the selected row indices are actually different. Since it is an optimizer updating the indices, it is not straightforward to put restrictions on it. You might think of something like optimizing an index over the whole tensor, then an index over the whole tensor but one row, etc. although the implementation of the selection and the gradient would probably become significantly more complicated.
推荐阅读
- java - 在android studio中移动到下一页时出错
- android - 添加新列表后,删除 Spinner Adapter 中的选定值(选中的单选按钮)
- regex - 正则表达式规则匹配特定字符前后的空格
- testing - 使用 TestCafe 选择器,如何验证 <select> 中所选项目的文本?
- reactjs - 如果没有连接,则显示小吃吧
- java - 关于如何将数据键入移动应用程序的盒装空间的建议
- python - 我必须从多个文本文件(.txt)中识别段落并创建一个数据框 [paragraph1, "text of the file1 in paragraphs"]
- python-3.x - Altair 中具有自定义置信区间的折线图
- oracle - java.sql.SQLDataException: ORA-01830: 日期格式图片在转换整个输入字符串之前结束 (Oracle)
- javascript - 我必须在 D3 中使用“+(x)+”、“+(y)+”什么?而不是使用 d3.event.x&d3.event.y?