首页 > 解决方案 > 空字符串的正则表达式

问题描述

我有一个字符串:

s=

"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane 
(2021-06-29T11:58:29.053Z) [Nicholas]: 
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"

我想从中提取文本。预期输出为:

comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##'] 

我尝试过的是:

comments=re.findall(r']:\s+(.*?)\n',s) 

正则表达式效果很好,但我无法将空白文本作为''

标签: python-3.xregexstring

解决方案


您可以在捕获组中排除匹配],如果您还想匹配最后一行的值,您可以断言字符串的结尾$而不是匹配强制换行符\n

请注意\s可以匹配换行符,并且否定字符类[^]]*也可以匹配换行符

]:\s+([^]]*)$

正则表达式演示| Python 演示

import re

regex = r"]:\s+([^]]*)$"

s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, s, re.MULTILINE))

输出

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##'] 

如果您不想越线:

]:[^\S\n]+([^]\n]*)$

正则表达式演示


推荐阅读