首页 > 解决方案 > Extract all substrings from a string that are between a word and a delimiter using regex c++

问题描述

I have the following query:

std::string query =
"ODR+1"
"DPT+::SFO"
"ARR+::MKE"
"ODR+2"
"DPT+::MKE"
"ARR+::SFO";

I am trying to extract from all the segments starting with ARR or DPT the values after ::. I wrote the following regular expression [DPT|ARR]\+\:\:(.*). It worked when I tested it on regex101

When I wrote the following C++ code. I got the following output:

DPT+::SFO'ARR+::MKE'ODR+2'DPT+::MKE'ARR+::SFO'

The output is wrong I really just want to extract SFO and MKO. How can I modify the regex query to just extract these patterns


   #include <regex>
#include <iostream>

int main()
{
    std::string query =
    "ODR+1'"
    "DPT+::SFO'"
    "ARR+::MKE'"
    "ODR+2'"
    "DPT+::MKE'"
    "ARR+::SFO'";
    
    std::regex regulaExpression("(DPT|ARR).*::(.*)\\'");

    std::sregex_iterator iter(query.begin(), query.end(), regulaExpression);
    std::sregex_iterator end;

    while(iter != end)
    {
        std::cout << iter->str() << std::endl;
        ++iter;
    }
}

UPDATE

I updated the code:

#include <regex>
#include <iostream>
#include <cstring>

int main()
{  
    const char *target  =
            "ODR+1'"
            "DPT+::SFO'"
            "ARR+::MKE'"
            "ODR+2'"
            "DPT+::MKE'"
            "ARR+::SFO'";

    std::regex rgx("(DPT|ARR).*?::(.*?)'");
    for(auto it = std::cregex_iterator(target, target + std::strlen(target), rgx);
             it != std::cregex_iterator();
           ++it)
    {
        std::cmatch match = *it;
        std::cout << match[2].str() << '\n';
    }
    
    return 0;
}

Now it is allowing me to retrieve the following. Which is exactly what I want. But I dont know why it work.

SFo                                                                                                                                                                                     
MKE                                                                                                                                                                                     
MKE                                                                                                                                                                                     
SFO

It worked by why did I have to use std::cout << match[2].str() << '\n';

标签: c++regex

解决方案


问题在于您的正则表达式:

(DPT|ARR).*?::(.*?)'

第一部分将获取以or(DPT|ARR)开头的字符串,但也会保存它,因此结果的第一个元素具有此值。为避免这种情况,请使用非捕获组:DPTARRmatch[1](?: )

第二部分.*?是问题:它捕获所有内容,包括::,因此您的正则表达式永远找不到分隔符。您想搜索除 之外的所有内容:,也可能不搜索'(以避免错误的部分传播给其他人):(?:[^':]*:)+:
第一部分搜索直到第一个的:内容,然后检查:后面是否还有另一个。如果你确定这部分没有 single :,你可以简化它。

最后,您得到所需的字符串:([^']*)直到第一个'. 括号仅用于捕获内容,因此您可以使用match[1]

(?:DPT|ARR)(?:[^':]*:)+:([^']*)

推荐阅读