首页 > 解决方案 > Python SpeechRecognition Snowboy 集成好像坏了

问题描述

我正在用 Python 构建个人助理。似乎 Python 的 SpeechRecognition 库具有内置的 Snowboy 识别功能,但它似乎已损坏。这是我的代码。(请注意,问题在于 listen() 函数永远不会返回)。

import speech_recognition as sr
from SnowboyDependencies import snowboydecoder
def get_text():
    with sr.Microphone(sample_rate = 48000) as source:
        audio = r.listen(source, snowboy_configuration=("SnowboyDependencies", {hotword_path})) #PROBLEM HERE
    try:
        text = r.recognize_google(audio).lower()
    except:
        text = none
        print("err")
    return text

我在 SpeechRecognition 中进行了一些挖掘,发现了问题所在,但我不确定如何解决它,因为我对库的复杂性不太熟悉。问题是 sr.listen 永远不会返回。看来 Snowboy 热门词检测 100% 有效,因为当我说出我的热门词时,程序会继续运行。这是源代码。我添加了自己的评论以尝试进一步描述该问题。我添加了三个评论,所有评论都包含在#s 的多行框中。

def listen(self, source, timeout=None, phrase_time_limit=None, snowboy_configuration=None):
    """
    Records a single phrase from ``source`` (an ``AudioSource`` instance) into an ``AudioData`` instance, which it returns.

    This is done by waiting until the audio has an energy above ``recognizer_instance.energy_threshold`` (the user has started speaking), and then recording until it encounters ``recognizer_instance.pause_threshold`` seconds of non-speaking or there is no more audio input. The ending silence is not included.

    The ``timeout`` parameter is the maximum number of seconds that this will wait for a phrase to start before giving up and throwing an ``speech_recognition.WaitTimeoutError`` exception. If ``timeout`` is ``None``, there will be no wait timeout.

    The ``phrase_time_limit`` parameter is the maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached. The resulting audio will be the phrase cut off at the time limit. If ``phrase_timeout`` is ``None``, there will be no phrase time limit.

    The ``snowboy_configuration`` parameter allows integration with `Snowboy <https://snowboy.kitt.ai/>`__, an offline, high-accuracy, power-efficient hotword recognition engine. When used, this function will pause until Snowboy detects a hotword, after which it will unpause. This parameter should either be ``None`` to turn off Snowboy support, or a tuple of the form ``(SNOWBOY_LOCATION, LIST_OF_HOT_WORD_FILES)``, where ``SNOWBOY_LOCATION`` is the path to the Snowboy root directory, and ``LIST_OF_HOT_WORD_FILES`` is a list of paths to Snowboy hotword configuration files (`*.pmdl` or `*.umdl` format).

    This operation will always complete within ``timeout + phrase_timeout`` seconds if both are numbers, either by returning the audio data, or by raising a ``speech_recognition.WaitTimeoutError`` exception.
    """
    assert isinstance(source, AudioSource), "Source must be an audio source"
    assert source.stream is not None, "Audio source must be entered before listening, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?"
    assert self.pause_threshold >= self.non_speaking_duration >= 0
    if snowboy_configuration is not None:
        assert os.path.isfile(os.path.join(snowboy_configuration[0], "snowboydetect.py")), "``snowboy_configuration[0]`` must be a Snowboy root directory containing ``snowboydetect.py``"
        for hot_word_file in snowboy_configuration[1]:
            assert os.path.isfile(hot_word_file), "``snowboy_configuration[1]`` must be a list of Snowboy hot word configuration files"

    seconds_per_buffer = float(source.CHUNK) / source.SAMPLE_RATE
    pause_buffer_count = int(math.ceil(self.pause_threshold / seconds_per_buffer))  # number of buffers of non-speaking audio during a phrase, before the phrase should be considered complete
    phrase_buffer_count = int(math.ceil(self.phrase_threshold / seconds_per_buffer))  # minimum number of buffers of speaking audio before we consider the speaking audio a phrase
    non_speaking_buffer_count = int(math.ceil(self.non_speaking_duration / seconds_per_buffer))  # maximum number of buffers of non-speaking audio to retain before and after a phrase

    # read audio input for phrases until there is a phrase that is long enough
    elapsed_time = 0  # number of seconds of audio read
    buffer = b""  # an empty buffer means that the stream has ended and there is no data left to read

    ##################################################
    ######THE ISSIE IS THAT THIS LOOP NEVER EXITS#####
    ##################################################
    while True:
        frames = collections.deque()

        if snowboy_configuration is None:
            # store audio input until the phrase starts
            while True:
                # handle waiting too long for phrase by raising an exception
                elapsed_time += seconds_per_buffer
                if timeout and elapsed_time > timeout:
                    raise WaitTimeoutError("listening timed out while waiting for phrase to start")

                buffer = source.stream.read(source.CHUNK)
                if len(buffer) == 0: break  # reached end of the stream
                frames.append(buffer)
                if len(frames) > non_speaking_buffer_count:  # ensure we only keep the needed amount of non-speaking buffers
                    frames.popleft()

                # detect whether speaking has started on audio input
                energy = audioop.rms(buffer, source.SAMPLE_WIDTH)  # energy of the audio signal
                if energy > self.energy_threshold: break

                # dynamically adjust the energy threshold using asymmetric weighted average
                if self.dynamic_energy_threshold:
                    damping = self.dynamic_energy_adjustment_damping ** seconds_per_buffer  # account for different chunk sizes and rates
                    target_energy = energy * self.dynamic_energy_ratio
                    self.energy_threshold = self.energy_threshold * damping + target_energy * (1 - damping)
        else:
            # read audio input until the hotword is said
            #############################################################
            ########THIS IS WHERE THE HOTWORD DETECTION OCCURRS. HOTWORDS ARE DETECTED. I KNOW THIS BECAUSE THE PROGRAM PROGRESSES PAST THIS PART. 
            #############################################################
            snowboy_location, snowboy_hot_word_files = snowboy_configuration
            buffer, delta_time = self.snowboy_wait_for_hot_word(snowboy_location, snowboy_hot_word_files, source, timeout)
            elapsed_time += delta_time
            if len(buffer) == 0: break  # reached end of the stream
            frames.append(buffer)

        # read audio input until the phrase ends
        pause_count, phrase_count = 0, 0
        phrase_start_time = elapsed_time

        while True:
            # handle phrase being too long by cutting off the audio
            elapsed_time += seconds_per_buffer
            if phrase_time_limit and elapsed_time - phrase_start_time > phrase_time_limit:
                break

            buffer = source.stream.read(source.CHUNK)
            if len(buffer) == 0: break  # reached end of the stream
            frames.append(buffer)
            phrase_count += 1

            # check if speaking has stopped for longer than the pause threshold on the audio input
            energy = audioop.rms(buffer, source.SAMPLE_WIDTH)  # unit energy of the audio signal within the buffer
            if energy > self.energy_threshold:
                pause_count = 0
            else:
                pause_count += 1
            if pause_count > pause_buffer_count:  # end of the phrase
                break

        # check how long the detected phrase is, and retry listening if the phrase is too short
        phrase_count -= pause_count  # exclude the buffers for the pause before the phrase
       ####################################################################3
       #######THE FOLLOWING CONDITION IS NEVER MET THEREFORE THE LOOP NEVER EXITS AND THE FUNCTION NEVER RETURNS################
       ############################################################################
        if phrase_count >= phrase_buffer_count or len(buffer) == 0: break  # phrase is long enough or we've reached the end of the stream, so stop listening

  # obtain frame data
    for i in range(pause_count - non_speaking_buffer_count): frames.pop()  # remove extra non-speaking frames at the end
    frame_data = b"".join(frames)

    return AudioData(frame_data, source.SAMPLE_RATE, source.SAMPLE_WIDTH)

问题是listen() 中的主while 循环永远不会退出。我不确定为什么。请注意,当我不集成 snowboy 时,SpeechRecognition 模块可以完美运行。另请注意,snowboy 可以完美地自行运行。

我还提供了 speech_recognition.snowboy_wait_for_hot_word() 方法,因为问题可能出在此处。

def snowboy_wait_for_hot_word(self, snowboy_location, snowboy_hot_word_files, source, timeout=None):
    print("made it")
    # load snowboy library (NOT THREAD SAFE)
    sys.path.append(snowboy_location)
    import snowboydetect
    sys.path.pop()

    detector = snowboydetect.SnowboyDetect(
        resource_filename=os.path.join(snowboy_location, "resources", "common.res").encode(),
        model_str=",".join(snowboy_hot_word_files).encode()
    )
    detector.SetAudioGain(1.0)
    detector.SetSensitivity(",".join(["0.4"] * len(snowboy_hot_word_files)).encode())
    snowboy_sample_rate = detector.SampleRate()

    elapsed_time = 0
    seconds_per_buffer = float(source.CHUNK) / source.SAMPLE_RATE
    resampling_state = None

    # buffers capable of holding 5 seconds of original and resampled audio
    five_seconds_buffer_count = int(math.ceil(5 / seconds_per_buffer))
    frames = collections.deque(maxlen=five_seconds_buffer_count)
    resampled_frames = collections.deque(maxlen=five_seconds_buffer_count)
    while True:
        elapsed_time += seconds_per_buffer
        if timeout and elapsed_time > timeout:
            raise WaitTimeoutError("listening timed out while waiting for hotword to be said")

        buffer = source.stream.read(source.CHUNK)
        if len(buffer) == 0: break  # reached end of the stream
        frames.append(buffer)

        # resample audio to the required sample rate
        resampled_buffer, resampling_state = audioop.ratecv(buffer, source.SAMPLE_WIDTH, 1, source.SAMPLE_RATE, snowboy_sample_rate, resampling_state)
        resampled_frames.append(resampled_buffer)

        # run Snowboy on the resampled audio
        snowboy_result = detector.RunDetection(b"".join(resampled_frames))
        assert snowboy_result != -1, "Error initializing streams or reading audio data"
        if snowboy_result > 0: break  # wake word found

    return b"".join(frames), elapsed_time

我在运行 Raspbian Buster Lite(内核 4.19.36)的 Raspberry pi 3B+ 上运行 python 3.7。请询问我是否可以提供任何其他信息。

标签: python-3.xraspberry-pispeech-recognitionraspbiansnowboy

解决方案


推荐阅读