首页 > 解决方案 > 使用正则表达式仅保留特殊字符串

问题描述

给定一个 python 字符串,我想只保留由这种模式组成的子字符串:任意两个字符 + 下划线 + 任意四个字符。例如,如果我有这个字符串AX_45TH (23) - JK_I KL_9056,我只想返回AX_45THand KL_9056

其他几个输入和预期输出的例子:

输入:"12_MKTY (BLUE), RED YU_MKT6" 输出:["12_MKTY", "YU_MKT6"]

输入:"12_MKT (BLUE), RED YU_MKT6" 输出:["YU_MKT6"]

输入:"12_M (BLUE), RED YU_MKT6" 输出:["YU_MKT6"]

标签: pythonregex

解决方案


清单[Python.Docs]:重新正则表达式操作

>>> import re
>>>
>>>
>>> pat = re.compile("[0-9A-Za-z]{2}_[0-9A-Za-z]{4}")
>>>
>>> for text in ["AX_45TH (23) - JK_I KL_9056", "12_MKTY (BLUE), RED YU_MKT6", "12_MKT (BLUE), RED YU_MKT6", "12_M (BLUE), RED YU_MKT6"]:
...     print("{0:s}: {1:}".format(text, pat.findall(text)))
...
AX_45TH (23) - JK_I KL_9056: ['AX_45TH', 'KL_9056']
12_MKTY (BLUE), RED YU_MKT6: ['12_MKTY', 'YU_MKT6']
12_MKT (BLUE), RED YU_MKT6: ['YU_MKT6']
12_M (BLUE), RED YU_MKT6: ['YU_MKT6']

推荐阅读