首页 > 解决方案 > 在 Python 中使用 Regex 在 Numeric Quantity 值之前和之后添加数量标识符

问题描述

我在 Python 中使用 Regex 在 Numeric Quantity 值之前和之后添加数量标识符。

基本上,我必须在数字数量之后添加QtyOrdUnits单词,它不在文本中。

例如:

'PartNo-001A description 20 units some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20 some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20' => 'PartNo-001A description QtyOrd 20 units'
'PartNo-001A QtyOrd 20' => 'PartNo-001A QtyOrd 20 units'
'PartNo-001A 20 units'=> 'PartNo-001A QtyOrd 20 units'

正在使用的代码如下:

import re

def process_QtyOrd( text):
    for x in re.findall("(qtyord [0-9]+ units| [0-9]+ units|qtyord [0-9]+|qtyord[0-9]+units )", text.lower()):
        Text_Intermediate = "OrderQty " + str(re.search("[0-9]+", x).group()) + " Units"
    
    Text_Final = re.sub("(qtyord [0-9]+ units|[0-9]+ units|qtyord [0-9]+|qtyord [0-9]+ units)", Text_Intermediate, text, flags= re.IGNORECASE)

    return Text_Final

text1 = 'PartNo-001A description 20 units some other description'
text2 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

QtyOrd 20 units some other description

''' 
text3 = 'PartNo-001A description QtyOrd 20 some other description'
text4 = 'PartNo-001A description QtyOrd 20'
text5 = 'PartNo-001A QtyOrd 20'
text6 = 'PartNo-001A 20 units'
text7 = ''' 

Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

QtyOrd 20 units some other description PartNo-001A
 
''' 

text8 = ''' 

Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

PartNo-001A

QtyOrd 
20

''' 

然后:

print(process_QtyOrd(text1))
print(process_QtyOrd(text2))
print(process_QtyOrd(text3))
print(process_QtyOrd(text4))
print(process_QtyOrd(text5))
print(process_QtyOrd(text6))
print(process_QtyOrd(text7))
print(process_QtyOrd(text8))

因为text8代码不工作。你能帮我解决这个问题吗?

输出应该是这样的:

1. PartNo-001A description QtyOrd 20 Units some other description
 

2. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

QtyOrd 20 Units some other description


3. PartNo-001A description QtyOrd 20 Units some other description
4. PartNo-001A description QtyOrd 20 Units
5. PartNo-001A QtyOrd 20 Units
6. PartNo-001A QtyOrd 20 Units
 

7. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

QtyOrd 20 Units some other description PartNo-001A
 

8. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.

PartNo-001A

QtyOrd 
20
units

标签: pythonregexregex-lookaroundsregex-groupre

解决方案


假设您对 text8 的预期结果是OrderQty 20 Units,不是QtyOrd 20 units,请您尝试:

def process_QtyOrd(text):
    m = re.match(r'^(.*?)(qtyord\s*\d+\s*units|\d+\s+units|qtyord\s+\d+)(.*)$', text, flags=re.IGNORECASE|re.DOTALL)
    if m:
        str= re.sub(r'\D*(\d+)\D*', r'OrderQty \1 Units', m.group(2))
        text = m.group(1) + str + m.group(3)
    return text
  • re.match输入text分成三个子字符串:我们要修改的感兴趣的部分、前导子字符串和尾随子字符串。
  • 我已经使用\s而不是 (whitespace) 来匹配换行符。
  • 感兴趣的部分由 捕获m.groups(2)。然后我们可以用re.sub()函数来修改它。
  • 最后text是上面子串的串联。

[更新]

请您尝试以下方法:

def process_QtyOrd(text):
    text = re.sub(r'(qtyord\s*\d+\s*units|\d+\s+units|qtyord\s+\d+)', lambda m: re.sub(r'\D*(\d+)\D*', r'QtyOrd \1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
    return text

现在,如果文本包含多个模式,它将起作用。我已将替换文本中的单词OrderQty更改为。QtyOrd

  • re.match()我没有使用上一个答案 中的函数,而是使用re.sub()函数一次替换文本中所有出现的模式。
  • 引入 lambda 函数以将替换部分作为表达式进行计算。

[更新2]

如果要包含十进制数字,请尝试以下操作:

def process_QtyOrd(text):
    text = re.sub(r'(qtyord\s*\d+(?:\.\d+)?\s*units|\d+(?:\.\d+)?\s+units|qtyord\s+\d+(?:\.\d+)?)', lambda m: re.sub(r'\D*(\d+(?:\.\d+)?)\D*', r'QtyOrd \1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
    return text

这个概念是替换\d+\d+(?:\.\d+)?匹配的数字,后跟可选的点和数字。


推荐阅读