python - 从保存的配置和权重构建时 tf.keras.layers.TextVectorization 的错误
问题描述
我尝试编写一个 python 程序来将 tf.keras.layers.TextVectorization 保存到磁盘并使用如何在 tensorflow 中将 TextVectorization 保存到磁盘的答案加载它?. output_sequence_length
当 arg不是None
并且时,从保存的配置构建的 TextVectorization 层会输出一个长度错误的向量output_mode='int'
。例如,如果我设置output_sequence_length= 10
, 和output_mode='int'
,预计给定一个文本,TextVectorization 应该输出一个长度为 10 的向量,请参见下面的代码中的vectorizer
和new_v2
。但是,如果 TextVectorization 的 argoutput_mode='int'
是从保存的配置中设置的,它不会输出长度为 10 的向量(实际上是 9,是句子的实际长度。似乎output_sequence_length
没有设置成功)。查看对象new_v1
在下面的代码中。有趣的是,我比较了from_disk['config']['output_mode']
和'int'
,它们彼此相等。
import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle
# In[]
max_len = 10 # Sequence length to pad the outputs to.
text_dataset = tf.data.Dataset.from_tensor_slices([
"I like natural language processing",
"You like computer vision",
"I like computer games and computer science"])
# Fit a TextVectorization layer
VOCAB_SIZE = 10 # Maximum vocab size.
vectorizer = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode='int',
output_sequence_length=max_len
)
vectorizer.adapt(text_dataset.batch(64))
# In[]
#print(vectorizer.get_vocabulary())
#print(vectorizer.get_config())
#print(vectorizer.get_weights())
# In[]
# Pickle the config and weights
pickle.dump({'config': vectorizer.get_config(),
'weights': vectorizer.get_weights()}
, open("./models/tv_layer.pkl", "wb"))
# Later you can unpickle and use
# `config` to create object and
# `weights` to load the trained weights.
from_disk = pickle.load(open("./models/tv_layer.pkl", "rb"))
new_v1 = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode=from_disk['config']['output_mode'],
output_sequence_length=from_disk['config']['output_sequence_length'],
)
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v1.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v1.set_weights(from_disk['weights'])
new_v2 = tf.keras.layers.TextVectorization(
max_tokens=None,
standardize="lower_and_strip_punctuation",
split="whitespace",
output_mode='int',
output_sequence_length=from_disk['config']['output_sequence_length'],
)
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v2.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v2.set_weights(from_disk['weights'])
print ("*"*10)
# In[]
test_sentence="Jack likes computer scinece, computer games, and foreign language"
print(vectorizer(test_sentence))
print (new_v1(test_sentence))
print (new_v2(test_sentence))
print(from_disk['config']['output_mode']=='int')
以下是 print() 输出:
**********
tf.Tensor([ 1 1 3 1 3 11 12 1 10 0], shape=(10,), dtype=int64)
tf.Tensor([ 1 1 3 1 3 11 12 1 10], shape=(9,), dtype=int64)
tf.Tensor([ 1 1 3 1 3 11 12 1 10 0], shape=(10,), dtype=int64)
True
有谁知道为什么?
解决方案
推荐阅读
- c++ - 带有 Swig 和 go-integration 包装器错误的 Cmake
- c# - .NET Core DI:无法使用单例的范围服务
- c# - 使用实体框架的数据库第一个错误
- angular - 无法在 ACE 编辑器中突出显示语法
- jupyter-notebook - 在 RISE 演示文稿中按键时一一显示项目符号
- conv-neural-network - 从 ImageNet 直接推断我的数据集(没有训练或微调)?
- sql - SQL中平均计数的高效查询
- python - 以秒格式绘制时间序列
- sql-server - 如何将 SSIS 查找转换中的代码页修复为 65001?
- matlab - 使用 psychtoolbox 构建实验