python - 唤醒词检测中的误报过多
问题描述
我尝试使用 TensorFlow 实现唤醒词检测。但它会将大部分噪音检测为负面。
负词数据集大小为 679,取自 [https://www.kaggle.com/mozillaorg/common-voice][1]
触发词数据集大小约为 200,但它会随机改变每个训练数据的音高和速度。
最后,训练数据集大约是 600。我达到了 94% 的准确率(由于内存限制,我无法增加训练数据集的大小。我必须通过 500 和 10 个 epoch 的批次拟合模型 3 次)
我的模型:
X_input = Input(shape = input_shape)
### START CODE HERE ###
# Step 1: CONV layer (≈4 lines)
X = Conv1D(32, kernel_size=15, strides=4)(X_input)
X = BatchNormalization()(X) # Batch normalization
X = Activation('relu')(X) # ReLu activation
X = Dropout(0.3)(X) # dropout (use 0.8)
# Step 2: First GRU Layer (≈4 lines)
X = LSTM(units = 512, return_sequences = True)(X) # GRU (use 128 units and return the sequences)
X = Dropout(0.3)(X) # dropout (use 0.8)
X = BatchNormalization()(X)
# # Batch normalization
# X = GRU(units = 128, return_sequences = True)(X) # GRU (use 128 units and return the sequences)
# X = Dropout(0.3)(X) # dropout (use 0.8)
# X = BatchNormalization()(X)
# Step 3: Second GRU Layer (≈4 lines)
X = LSTM(units = 512, return_sequences = True)(X) # GRU (use 128 units and return the sequences)
X = Dropout(0.4)(X) # dropout (use 0.8)
X = BatchNormalization()(X) # Batch normalization
X = Dropout(0.4)(X) # dropout (use 0.8)
# Step 4: Time-distributed dense layer (≈1 line)goa larget
X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)
以下是模型摘要:
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 5511, 101)] 0
_________________________________________________________________
conv1d (Conv1D) (None, 1375, 32) 48512
_________________________________________________________________
batch_normalization (BatchNo (None, 1375, 32) 128
_________________________________________________________________
activation (Activation) (None, 1375, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 1375, 32) 0
_________________________________________________________________
lstm (LSTM) (None, 1375, 512) 1116160
_________________________________________________________________
dropout_1 (Dropout) (None, 1375, 512) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 1375, 512) 2048
_________________________________________________________________
lstm_1 (LSTM) (None, 1375, 512) 2099200
_________________________________________________________________
dropout_2 (Dropout) (None, 1375, 512) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 1375, 512) 2048
_________________________________________________________________
dropout_3 (Dropout) (None, 1375, 512) 0
_________________________________________________________________
time_distributed (TimeDistri (None, 1375, 1) 513
=================================================================
Total params: 3,268,609
Trainable params: 3,266,497
Non-trainable params: 2,112
你们能告诉我如何减少误报吗?[1]:https ://www.kaggle.com/mozillaorg/common-voice
解决方案
推荐阅读
- c++ - 使 std::fstream 写入文件末尾但从头开始读取
- react-native - 在 React Native 中中止获取请求
- python - psycopg2 使用多个线程或进程处理游标结果
- android - 在文本字段旁边显示图标
- web - 将参数从 Web 应用程序传递到 Unified Service Desk 内的 CRM 案例屏幕
- performance - WhatsApp 如何如此快速地显示讨论?
- magento2 - Magento 2 重新索引过程卡在 Category Products 和 Product Categories
- python - 为什么numpy的转置()结合日期时间
- php - 使用 php,mysql 从数据库中的电子邮件列表发送电子邮件
- project-reactor - 如何在 r2dbc 中加入表?