首页 > 解决方案 > 在缓冲数据流中搜索字节模式

问题描述

我想搜索当这些数据可用时我以块(串行)接收的字节模式。例如,字节模式 0xbbffbbffbb。不能保证这个模式会被完整接收,所以做一个简单的 strnstrn 可能不是解决方案。我可以使用什么算法来寻找这种模式?我的方法是查找第一个字节(在本例中为 0xbb),然后确保我还有 4 个字节,然后将其与字符串进行比较。虽然如果两个字节后有一些垃圾数据会失败,比如 0xbbff01[bbffbbffbb]。

我的代码(对不起,如果破旧)看起来像这样:

char* pattern_search(char* buff, size_t *bytes_read)
{
    char* ptr = buff;
    uint16_t remaining_length = *bytes_read;

    while(1) {

        // look for one byte in the stream
        char* pattern_start = memmem((void*)ptr, remaining_length, 0xbb, 1);

        if (pattern_start == NULL) {
            // printf("nothing found\n");
            return NULL;
        }

        int pos = pattern_start - ptr;
        remaining_length = remaining_length - pos;
        ptr = pattern_start;

        // see if you have 5 bytes to compare, if not get more
        remaining_length += get_additional_bytes();

        // compare 5 bytes for pattern
        pattern_start = memmem((void*)ptr, remaining_length, start_flag, PATTERN_LEN);
        if (pattern_start == NULL) {
            // move one step and continue search
            ptr++;
            remaining_length--;
            // move these bytes back to beginning of the buffer
            memcpy(buff, ptr, remaining_length);
            ptr = buff;
            *bytes_read = remaining_length;
            if (remaining_length > 0) {
                continue;
            } else {
                return NULL;
            }
        } else {
            // found!
            printf("pattern found!\n");
            ptr = pattern_start;
            break;
        }
    }

    return ptr;
}

标签: cpattern-matching

解决方案


人们当然可以在这里找到许多不同的解决方案。一种可能是:

  • 将模式指定为无符号字符数组
  • 使用接收到的数据块和指向回调函数的指针调用“input_received”函数,只要找到模式就会调用该函数

它可能看起来像这样:

#include <stdio.h>

static unsigned const char PATTERN[] = {0xbb, 0xff, 0xbb, 0xff, 0xbb};

static void found(size_t pos) {
    printf("pattern found at index %zu\n", pos);
}

static void input_received(const unsigned char *const data,
                           int n,
                           void (*callback)(size_t)) {
    static int match_count;
    static size_t position;

    for (int i = 0; i < n; i++, position++) {
        if (data[i] == PATTERN[match_count]) {
            match_count++;
        } else {
            match_count = data[i] == PATTERN[0] ? 1 : 0;
        }
        if (match_count == sizeof PATTERN) {
            (*callback)(position - sizeof PATTERN + 1);
            match_count = 0;
        }
    }
}

int main(void) {

    unsigned char input[] = {0xff, 0x01, 0x02, 0xff, 0x00,
                             0xbb, 0xff, 0xbb, 0xff, 0xbb,
                             0xbb, 0xff, 0xbb, 0xff, 0xbb};

    input_received(input, 2, found);
    input_received(&input[2], 3, found);
    input_received(&input[5], 2, found);
    input_received(&input[7], 2, found);
    input_received(&input[9], 5, found);
    input_received(&input[14], 1, found);

    return 0;
}

测试

这将在调试控制台中输出以下内容:

pattern found at index 5
pattern found at index 10

推荐阅读