python - 训练子模型而不是完整模型 Tensorflow Federated
问题描述
我正在尝试修改 TensorFlow Federated 示例。我想从原始模型创建一个子模型,并将新创建的模型用于训练阶段,然后将权重发送到服务器,以便他更新原始模型。
我知道这不应该在内部完成,client_update
但服务器应该将正确的子模型直接发送到客户端,但现在我更喜欢这样做。
现在我有两个问题:
- 好像我不能
client_update
像这样在函数内创建一个新模型:
@tf.function
def client_update(model, dataset, server_message, client_optimizer):
"""Performans client local training of `model` on `dataset`.
Args:
model: A `tff.learning.Model`.
dataset: A 'tf.data.Dataset'.
server_message: A `BroadcastMessage` from server.
client_optimizer: A `tf.keras.optimizers.Optimizer`.
Returns:
A 'ClientOutput`.
"""
model_weights = model.weights
import dropout_model
dropout_model = dropout_model.get_dropoutmodel(model)
initial_weights = server_message.model_weights
tf.nest.map_structure(lambda v, t: v.assign(t), model_weights,
initial_weights)
.....
错误是这个:
ValueError: tf.function-decorated function tried to create variables on non-first call.
创建的模型是这样的:
def from_original_to_submodel(only_digits=True):
"""The CNN model used in https://arxiv.org/abs/1602.05629.
Args:
only_digits: If True, uses a final layer with 10 outputs, for use with the
digits only EMNIST dataset. If False, uses 62 outputs for the larger
dataset.
Returns:
An uncompiled `tf.keras.Model`.
"""
data_format = 'channels_last'
input_shape = [28, 28, 1]
max_pool = functools.partial(
tf.keras.layers.MaxPooling2D,
pool_size=(2, 2),
padding='same',
data_format=data_format)
conv2d = functools.partial(
tf.keras.layers.Conv2D,
kernel_size=5,
padding='same',
data_format=data_format,
activation=tf.nn.relu)
model = tf.keras.models.Sequential([
conv2d(filters=32, input_shape=input_shape),
max_pool(),
conv2d(filters=64),
max_pool(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(410, activation=tf.nn.relu), #20% dropout
tf.keras.layers.Dense(10 if only_digits else 62),
])
return model
def get_dropoutmodel(model):
keras_model = from_original_to_submodel(only_digits=False)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
return tff.learning.from_keras_model(keras_model, loss=loss, input_spec=model.input_spec)
- 更像是一个理论问题。我想像我说的那样训练一个子模型,所以我会采用从服务器发送的原始模型权重,
initial_weights
并且对于每一层,我会为子模型权重分配一个随机权重的子列表。例如,initial_weights
对于第 6 层包含 100 个元素,我的同一层的新子模型只有 40 个元素,我会从带有种子的随机数中选择 40 个元素,进行训练,然后将种子发送到服务器,这样他会选择相同的索引,然后只更新它们。那是对的吗?我的第二个版本仍然是创建 100 个元素(40 个随机元素,60 个等于 0),但我认为这会在服务器端聚合时弄乱模型性能。
编辑:
我已经client_update_fn
像这样修改了函数:
@tff.tf_computation(tf_dataset_type, server_message_type)
def client_update_fn(tf_dataset, server_message):
model = model_fn()
submodel = submodel_fn()
client_optimizer = client_optimizer_fn()
return client_update(model, submodel, tf_dataset, server_message, client_optimizer)
像这样向函数添加一个新参数build_federated_averaging_process
:
def build_federated_averaging_process(
model_fn, submodel_fn,
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.1)):
在main.py
我这样做了:
def tff_submodel_fn():
keras_model = create_submodel_dropout(only_digits=False)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
return tff.learning.from_keras_model(keras_model, loss=loss, input_spec=train_data.element_type_structure)
iterative_process = simple_fedavg_tff.build_federated_averaging_process(
tff_model_fn, tff_submodel_fn, server_optimizer_fn, client_optimizer_fn)
现在在client_update
i 内部可以使用子模型:
@tf.function
def client_update(model, submodel, dataset, server_message, client_optimizer):
"""Performans client local training of `model` on `dataset`.
Args:
model: A `tff.learning.Model`.
dataset: A 'tf.data.Dataset'.
server_message: A `BroadcastMessage` from server.
client_optimizer: A `tf.keras.optimizers.Optimizer`.
Returns:
A 'ClientOutput`.
"""
model_weights = model.weights
initial_weights = server_message.model_weights
submodel_weights = submodel.weights
tf.nest.map_structure(lambda v, t: v.assign(t), submodel_weights,
initial_weights)
num_examples = tf.constant(0, dtype=tf.int32)
loss_sum = tf.constant(0, dtype=tf.float32)
# Explicit use `iter` for dataset is a trick that makes TFF more robust in
# GPU simulation and slightly more performant in the unconventional usage
# of large number of small datasets.
weights_delta = []
testing = False
if not testing:
for batch in iter(dataset):
with tf.GradientTape() as tape:
outputs = model.forward_pass(batch)
grads = tape.gradient(outputs.loss, submodel_weights.trainable)
client_optimizer.apply_gradients(zip(grads, submodel_weights.trainable))
batch_size = tf.shape(batch['x'])[0]
num_examples += batch_size
loss_sum += outputs.loss * tf.cast(batch_size, tf.float32)
weights_delta = tf.nest.map_structure(lambda a, b: a - b,
submodel_weights.trainable,
initial_weights.trainable)
client_weight = tf.cast(num_examples, tf.float32)
return ClientOutput(weights_delta, client_weight, loss_sum / client_weight)
我收到此错误:
ValueError: No gradients provided for any variable: ['conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0', 'dense_3/kernel:0', 'dense_3/bias:0'].
Fatal Python error: Segmentation fault
Current thread 0x00007f27af18b740 (most recent call first):
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1853 in _create_c_op
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2041 in __init__
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3557 in _create_op_internal
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 599 in _create_op_internal
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 748 in _apply_op_helper
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1276 in delete_iterator
File "virtual-environment/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 549 in __del__
Process finished with exit code 11
现在模型和原来的模型一样,我复制了create_original_fedavg_cnn_model
里面的函数create_submodel_dropout
所以我不明白出了什么问题
解决方案
一般来说,我们不能在 a 中创建变量,tf.function
因为该方法将在 TFF 计算中重复使用;尽管从技术上讲,变量只能在tf.function
. 我们可以看到,在大多数 TFF 库代码中model
实际上是在外部创建的tf.function
,并作为参数传递给 a tf.function
(例如:https ://github.com/tensorflow/federated/blob/44d012f690005ecf9217e3be970a4f8a356e88ed/tensorflow_federated/python/examples /simple_fedavg/simple_fedavg_tff.py#L101)。另一种可能的研究可能是tf.init_scope
上下文,但请确保完整阅读有关警告和行为的所有文档。
TFF 有一个新的通信原语,称为tff.federated_select
它可能在这里非常有用。内在附带两个教程:
- 向特定的客户发送不同
tff.federated_select
的数据,具体讨论通信原语。 - 客户端高效的大型模型联邦学习通过
federated_select
和稀疏聚合演示了使用federated_select
联邦学习进行线性回归;并演示了“稀疏聚合”的必要性,即您在填充零时发现的困难。
推荐阅读
- assembly - 重复错误使用中断 0x13 将 FAT12 根目录读入内存
- c++ - C++ 为派生类调用正确的方法
- libgdx - 图像不均匀下落运动
- javascript - 为什么我的单例模块似乎有两个实例?
- scala - Fs2 Stream.Compiler 未找到(找不到隐含值 Compiler[[x]F[x],G])
- python - 有没有办法通过沿时间维度计算每个单元格的模式来聚合 xarray DataArray?
- java - 从 facebook 获取用户个人资料图像时引发 Json 异常
- python - 使用 Scrapy 访问图像 URL
- c++ - 将动态创建的对象存储在数组 C++ 中
- python - 将字符串装入预定的模式?