首页 > 解决方案 > regex, on match strip and capture?

问题描述

I have a working block of code, but something tells me it's not the most efficient.

What I have below seems to do that just fine.

import re

alt_name = ""

name1 = "JUST A NAME"
name2 = "UNITED STATES STORE DBA USA INC"
name3 = "ANOTHER FIELD"

regex = re.compile(r"\b(DBA\b.{2,})|\b(ATTN\b.{2,})")
if re.search(regex, name1):
    match = re.search(regex, name1)
    alt_name = match.group(0)
    name1 = re.sub(regex, "", name1)
elif re.search(regex, name2):
    match = re.search(regex, name2)
    alt_name = match.group(0)
    name2 = re.sub(regex, "", name2)
elif re.search(regex, name3):
    match3 = re.search(regex, name3)
    alt_name = match.group(0)
    name3 = re.sub(regex, "", name3)

print(name1)
print(name2)
print(name3)
print(alt_name)

Is there a way to capture and strip with just 1 line instead of searching, matching and then subbing? I'm looking for efficiency and readability. Just making it short to be clever isn't what I'm going for. Maybe this is just the way to do it?

标签: pythonregexpython-3.x

解决方案


您可以使用方法作为替换参数,re.sub将匹配的文本保存到变量中,如果要删除找到的匹配项,只需返回并空字符串。

但是,您必须重写您的模式以提高效率:

r"\s*\b(?:DBA|ATTN)\b.{2,}"

请参阅正则表达式演示

  • \s*- 0+ 空白字符
  • \b- 单词边界
  • (?:DBA|ATTN)- aDBAATTN子字符串
  • \b- 单词边界
  • .{2,}- 2 个或更多除 LF 符号之外的字符,尽可能多。

这是一个例子:

import re

class RegexMatcher:
    val = ''
    rx = re.compile(r"\s*\b(?:DBA|ATTN)\b.{2,}")

    def runsub(self, m):
        self.val = m.group(0).lstrip()
        return ""

    def process(self, s):
        return self.rx.sub(self.runsub, s)

rm = RegexMatcher()
name = "UNITED STATES STORE DBA USA INC"
print(rm.process(name))
print(rm.val)

请参阅Python 演示

也许创建一个列表变量更有意义val,然后.append(m.group(0).lstrip()).


推荐阅读