c - 如何使 strstr 高效,以便它不会捕获不需要的子字符串
问题描述
例如,如果字符串 is"Only for Geeky People"
并且我只寻找"Geek"
substring not "Geeky"
,它会说这个词不存在。
即为strstr("Only for Geeky People", "Geek")
NULL。
我该如何解决这样的问题?
解决方案
您必须通过包装一个函数来处理它strstr()
,也许str_word()
(这避免了保留名称),它在找到单词后会进行额外的检查。或者,至少,这可能是最明智的处理方式。
用空格填充搜索字符串将不起作用。前导填充会阻止代码找到"Geek"
or "(Geek is not pejorative)"
; 尾随填充会阻止它找到"Ozymandias is a Geek"
. 等等如果你想去OTT,你可以考虑使用一个强大的正则表达式库,比如PCRE,但是对于这个任务来说它是多余的(而且 POSIX<regex.h>
不够强大 - 它不识别单词边界)。
char *str_word(char *haystack, const char *needle)
{
char *from = haystack;
size_t length = strlen(needle);
char *found;
while ((found = strstr(from, needle)) != NULL)
{
if (found > haystack && isalpha((unsigned char)found[-1]))
from += length;
else if (isalpha((unsigned char)found[length]))
from += length;
else
return found;
}
return NULL;
}
请注意,这允许该函数在"Ozymandias is such a Geeky Geek"
.
请注意尝试为此添加 const 正确性。你可以很容易地使用它:
const char *str_word(const char *haystack, const char *needle);
char *
但是,当传递 a 时,您不能返回非常量,const char *
而没有强制删除沿线某处的 const-ness。返回一个const char *
平底船将删除 const-ness 到调用代码的过程。这在以下情况下很重要:
char *word = str_word(line, "Geek");
您有一个包含一行输入的变量数组;您想在该行中搜索单词,并返回一个非常量指针。
测试代码:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
extern char *str_word(char *haystack, const char *needle);
char *str_word(char *haystack, const char *needle)
{
char *from = haystack;
size_t length = strlen(needle);
char *found;
while ((found = strstr(from, needle)) != NULL)
{
if (found > haystack && isalpha((unsigned char)found[-1]))
from += length;
else if (isalpha((unsigned char)found[length]))
from += length;
else
return found;
}
return NULL;
}
int main(void)
{
const char search[] = "Geek";
char haystacks[][64] =
{
"Geek",
"(Geek is not pejorative)",
"Ozymandias is a Geek",
"Ozymandias is such a Geeky Geek",
"No prizes for Geekiness",
"Only for Geeky people",
"Howling 'Geek' gets you nowhere",
"A Geek is a human",
"Geeky people run the tech world",
};
enum { NUM_HAYSTACKS = sizeof(haystacks) / sizeof(haystacks[0]) };
for (int i = 0; i < NUM_HAYSTACKS; i++)
{
char *word = str_word(haystacks[i], search);
if (word == NULL)
printf("Did not find '%s' in [%s]\n", search, haystacks[i]);
else
printf("Found '%s' at [%s] in [%s]\n", search, word, haystacks[i]);
}
return 0;
}
试验结果:
Found 'Geek' at [Geek] in [Geek]
Found 'Geek' at [Geek is not pejorative)] in [(Geek is not pejorative)]
Found 'Geek' at [Geek] in [Ozymandias is a Geek]
Found 'Geek' at [Geek] in [Ozymandias is such a Geeky Geek]
Did not find 'Geek' in [No prizes for Geekiness]
Did not find 'Geek' in [Only for Geeky people]
Found 'Geek' at [Geek' gets you nowhere] in [Howling 'Geek' gets you nowhere]
Found 'Geek' at [Geek is a human] in [A Geek is a human]
Did not find 'Geek' in [Geeky people run the tech world]
推荐阅读
- bash - 从bash中的文件名中提取日期
- c - 我们可以为 char * 分配内存但将其返回为 const char * 吗?
- json - AWS Cloudformation-如何在 json/yaml 模板中处理字符串大写或小写
- python - 使用 groupby 的键创建另一列 pandas python
- vue.js - Vue - 将插槽传递给子组件
- firebase - 如何正确使用 FirebaseVisionImage.fromBytes(Android 和 Ios)?
- r - 在 ggplot 中自定义对数转换标签
- c - 内存重新排序示例
- android - 波纹效果在整行上不可见
- c# - .resource 文件是否有可视化编辑器