python - 构建编码器-解码器
问题描述
我是深度学习领域的新手,我正在尝试构建一个编码器。
trainFromTextFile = "train.FROM"
trainToTextFile = "train.TO"
trainFromText = open(trainFromTextFile, 'r', encoding='utf-8').read().lower()
trainToText = open(trainToTextFile, 'r', encoding='utf-8').read().lower()
trainFromSentence = re.split('\n', trainFromText)
trainToSentence = re.split('\n', trainToText)
trainFromWords = re.split(' |\n', trainFromText)
trainToWords = re.split(' |\n', trainToText)
print('Found %s sentences from TrainFrom Text' %len(trainFromSentence))
print('Found %s sentences from TrainTo Text' %len(trainToSentence))
print('Found %s words from TrainFrom Text' %len(trainFromWords))
print('Found %s words from TrainTo Text' %len(trainToWords))
trainInput = trainFromSentence[0:1000]
trainTarget = trainToSentence[0:1000]
max_len = 100 # Cut comments after 100 words
max_words = 10000 # Consider the top 10,000 words in the dataset
tokenizerInput = Tokenizer(num_words=max_words)
tokenizerInput.fit_on_texts(trainInput)
wordInput = tokenizerInput.text_to_word_sequence(trainInput)
sequencesInput = tokenizerInput.texts_to_sequences(trainInput)
sequencesInput = pad_sequences(sequencesInput, maxlen=max_len) #Pad so all the arrays are the same size
Inputindex = tokenizerInput.word_index
Inputcount = tokenizerInput.word_counts
nInput = len(tokenizerInput.word_counts) + 1
print("Train From File:\n")
print('Found %s sentences.' %len(trainInput))
print('Found %s sequences.' %len(sequencesInput))
print('Found %s unique tokens.' % len(Inputindex))
print('Found %s unique words.' % len(Inputcount))
这就是我到目前为止所拥有的,我想知道如何使用我手头的数据并构建一个编码器来接收这些数据。
解决方案
推荐阅读
- python - '无'在python中的意思
- c++ - 可以在没有 Visual Studio 许可证的情况下将代码编译成静态库 .lib 文件吗?
- java - 如何为形状的位置设置动画?
- r - 如何修复 rep(0, nobs) 中的错误:对模型图使用预测函数时,“次”参数无效
- blender - Blender 2.90 和 2.91 中变换 Gizmo 的显示问题
- javascript - Yii2:如果选择值不是 promtp,则更改 div 的类
- input - 尝试制作 NDI 网络摄像头输入(虚拟输入)的第二个实例
- delphi - 使用 Delphi 的 HTML5 画布动画
- reactjs - 反应条件渲染
- mysql - Mysql - 带有重音符号的 Concat 字符串会导致格式错误的字符