首页 > 解决方案 > 如何使用正则表达式在提取的结果中包含换行符

问题描述

我正在处理一个类似于此的消息文本文件(尽管要长得多):

13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
Hello
13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
where someone added a line break
13/09/18, 4:10 pm - Fred Dag: Here is another message

以下正则表达式用于将数据提取到日期、时间、名称和消息中,除非消息包含换行符:

(?<date>(?:[0-9]{1,2}\/){2}[0-9]{1,2}),\s(?<time>(?:[0-9]{1,2}:)[0-9]{2}\s[a|p]m)\s-\s(?<name>(?:.*)):\s(?<message>(?:.+))

在 php7.4 中使用 preg_match_all 和上面的正则表达式,我生成了以下数组:

Array
(
    [0] => Array
        (
            [date] => 13/09/18
            [time] => 4:14 pm
            [name] => Fred Dag
            [message] => Jackie, please could you send to me too? ‚ thank you
        )

    [1] => Array
        (
            [date] => 13/09/18
            [time] => 4:45 pm
            [name] => Jackie Johnson
            [message] => Here is yet another message
        )

    [2] => Array
        (
            [date] => 13/09/18
            [time] => 4:10 pm
            [name] => Fred Dag
            [message] => Here is another message
        )

)

但是该数组缺少由应附加到上一条消息的换行符引起的行。在 regex101.com 中玩时我得到了相同的结果。

我想我已经用尽了我对正则表达式的了解,并用我知道要搜索的术语到达了谷歌的尽头:)有人能指出我正确的方向吗?

标签: regexline-breakspreg-match-all

解决方案


您的直接问题似乎是您用于匹配消息内容的点与换行符不匹配。这可以通过/s在 PHP 正则表达式中使用 dot all 标志轻松解决。但除此之外,我认为您的正则表达式也需要更改。我建议以下模式:

\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)

此模式匹配从开始日期开始的行,跨越换行符,直到到达下一条消息的开头或输入的结尾。

示例脚本:

$input = "13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you\nHello\n13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message\nwhere someone added a line break\n13/09/18, 4:10 pm - Fred Dag: Here is another message";
preg_match_all("/\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)/s", $input, $matches);
print_r($matches[0]);

这打印:

Array
(
    [0] => 13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
    Hello

    [1] => 13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
    where someone added a line break

    [2] => 13/09/18, 4:10 pm - Fred Dag: Here is another message
)

推荐阅读