首页 > 解决方案 > 唤醒词检测中的误报过多

问题描述

我尝试使用 TensorFlow 实现唤醒词检测。但它会将大部分噪音检测为负面。

负词数据集大小为 679,取自 [https://www.kaggle.com/mozillaorg/common-voice][1]

触发词数据集大小约为 200,但它会随机改变每个训练数据的音高和速度。

最后,训练数据集大约是 600。我达到了 94% 的准确率(由于内存限制,我无法增加训练数据集的大小。我必须通过 500 和 10 个 epoch 的批次拟合模型 3 次)

我的模型:

X_input = Input(shape = input_shape)
    
    ### START CODE HERE ###
    
    # Step 1: CONV layer (≈4 lines)
    X = Conv1D(32, kernel_size=15, strides=4)(X_input)  

    X = BatchNormalization()(X)                                 # Batch normalization
    X = Activation('relu')(X)                                 # ReLu activation
    X = Dropout(0.3)(X)                                 # dropout (use 0.8)

    # Step 2: First GRU Layer (≈4 lines)
    X = LSTM(units = 512, return_sequences = True)(X) # GRU (use 128 units and return the sequences)
    X = Dropout(0.3)(X)                                 # dropout (use 0.8)
    X = BatchNormalization()(X)  
#     # Batch normalization
#     X = GRU(units = 128, return_sequences = True)(X) # GRU (use 128 units and return the sequences)
#     X = Dropout(0.3)(X)                                 # dropout (use 0.8)
#     X = BatchNormalization()(X)      
    
    # Step 3: Second GRU Layer (≈4 lines)
    X = LSTM(units = 512, return_sequences = True)(X)   # GRU (use 128 units and return the sequences)
    X = Dropout(0.4)(X)                                 # dropout (use 0.8)
    X = BatchNormalization()(X)                                  # Batch normalization
    X = Dropout(0.4)(X)                                  # dropout (use 0.8)
    
    # Step 4: Time-distributed dense layer (≈1 line)goa larget
    X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed  (sigmoid)

以下是模型摘要:

    Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 5511, 101)]       0         
_________________________________________________________________
conv1d (Conv1D)              (None, 1375, 32)          48512     
_________________________________________________________________
batch_normalization (BatchNo (None, 1375, 32)          128       
_________________________________________________________________
activation (Activation)      (None, 1375, 32)          0         
_________________________________________________________________
dropout (Dropout)            (None, 1375, 32)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 1375, 512)         1116160   
_________________________________________________________________
dropout_1 (Dropout)          (None, 1375, 512)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 1375, 512)         2048      
_________________________________________________________________
lstm_1 (LSTM)                (None, 1375, 512)         2099200   
_________________________________________________________________
dropout_2 (Dropout)          (None, 1375, 512)         0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 1375, 512)         2048      
_________________________________________________________________
dropout_3 (Dropout)          (None, 1375, 512)         0         
_________________________________________________________________
time_distributed (TimeDistri (None, 1375, 1)           513       
=================================================================
Total params: 3,268,609
Trainable params: 3,266,497
Non-trainable params: 2,112

你们能告诉我如何减少误报吗?[1]:https ://www.kaggle.com/mozillaorg/common-voice

标签: pythontensorflowdeep-learningtime-seriessequence

解决方案


推荐阅读