首页 > 解决方案 > 如何使 strstr 高效,以便它不会捕获不需要的子字符串

问题描述

例如,如果字符串 is"Only for Geeky People"并且我只寻找"Geek"substring not "Geeky",它会说这个词不存在。

即为strstr("Only for Geeky People", "Geek")NULL。

我该如何解决这样的问题?

标签: cstrstr

解决方案


您必须通过包装一个函数来处理它strstr(),也许str_word()(这避免了保留名称),它在找到单词后会进行额外的检查。或者,至少,这可能是最明智的处理方式。

用空格填充搜索字符串将不起作用。前导填充会阻止代码找到"Geek"or "(Geek is not pejorative)"; 尾随填充会阻止它找到"Ozymandias is a Geek". 等等如果你想去OTT,你可以考虑使用一个强大的正则表达式库,比如PCRE,但是对于这个任务来说它是多余的(而且 POSIX<regex.h>不够强大 - 它不识别单词边界)。

char *str_word(char *haystack, const char *needle)
{
    char *from = haystack;
    size_t length = strlen(needle);
    char *found;
    while ((found = strstr(from, needle)) != NULL)
    {
        if (found > haystack && isalpha((unsigned char)found[-1]))
            from += length;
        else if (isalpha((unsigned char)found[length]))
            from += length;
        else
            return found;
    }
    return NULL;
}

请注意,这允许该函数在"Ozymandias is such a Geeky Geek".

请注意尝试为此添加 const 正确性。你可以很容易地使用它:

const char *str_word(const char *haystack, const char *needle);

char *但是,当传递 a 时,您不能返回非常量,const char *而没有强制删除沿线某处的 const-ness。返回一个const char *平底船将删除 const-ness 到调用代码的过程。这在以下情况下很重要:

char *word = str_word(line, "Geek");

您有一个包含一行输入的变量数组;您想在该行中搜索单词,并返回一个非常量指针。

测试代码:

#include <ctype.h>
#include <stdio.h>
#include <string.h>

extern char *str_word(char *haystack, const char *needle);

char *str_word(char *haystack, const char *needle)
{
    char *from = haystack;
    size_t length = strlen(needle);
    char *found;
    while ((found = strstr(from, needle)) != NULL)
    {
        if (found > haystack && isalpha((unsigned char)found[-1]))
            from += length;
        else if (isalpha((unsigned char)found[length]))
            from += length;
        else
            return found;
    }
    return NULL;
}

int main(void)
{
    const char search[] = "Geek";
    char haystacks[][64] =
    {
        "Geek",
        "(Geek is not pejorative)",
        "Ozymandias is a Geek",
        "Ozymandias is such a Geeky Geek",
        "No prizes for Geekiness",
        "Only for Geeky people",
        "Howling 'Geek' gets you nowhere",
        "A Geek is a human",
        "Geeky people run the tech world",
    };
    enum { NUM_HAYSTACKS = sizeof(haystacks) / sizeof(haystacks[0]) };

    for (int i = 0; i < NUM_HAYSTACKS; i++)
    {
        char *word = str_word(haystacks[i], search);
        if (word == NULL)
            printf("Did not find '%s' in [%s]\n", search, haystacks[i]);
        else
            printf("Found '%s' at [%s] in [%s]\n", search, word, haystacks[i]);
    }

    return 0;
}

试验结果:

Found 'Geek' at [Geek] in [Geek]
Found 'Geek' at [Geek is not pejorative)] in [(Geek is not pejorative)]
Found 'Geek' at [Geek] in [Ozymandias is a Geek]
Found 'Geek' at [Geek] in [Ozymandias is such a Geeky Geek]
Did not find 'Geek' in [No prizes for Geekiness]
Did not find 'Geek' in [Only for Geeky people]
Found 'Geek' at [Geek' gets you nowhere] in [Howling 'Geek' gets you nowhere]
Found 'Geek' at [Geek is a human] in [A Geek is a human]
Did not find 'Geek' in [Geeky people run the tech world]

推荐阅读