python - 在 Python 中使用 Regex 在 Numeric Quantity 值之前和之后添加数量标识符
问题描述
我在 Python 中使用 Regex 在 Numeric Quantity 值之前和之后添加数量标识符。
基本上,我必须在数字数量之后添加QtyOrd和Units单词,它不在文本中。
例如:
'PartNo-001A description 20 units some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20 some other description' => 'PartNo-001A description QtyOrd 20 units some other description'
'PartNo-001A description QtyOrd 20' => 'PartNo-001A description QtyOrd 20 units'
'PartNo-001A QtyOrd 20' => 'PartNo-001A QtyOrd 20 units'
'PartNo-001A 20 units'=> 'PartNo-001A QtyOrd 20 units'
正在使用的代码如下:
import re
def process_QtyOrd( text):
for x in re.findall("(qtyord [0-9]+ units| [0-9]+ units|qtyord [0-9]+|qtyord[0-9]+units )", text.lower()):
Text_Intermediate = "OrderQty " + str(re.search("[0-9]+", x).group()) + " Units"
Text_Final = re.sub("(qtyord [0-9]+ units|[0-9]+ units|qtyord [0-9]+|qtyord [0-9]+ units)", Text_Intermediate, text, flags= re.IGNORECASE)
return Text_Final
text1 = 'PartNo-001A description 20 units some other description'
text2 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description
'''
text3 = 'PartNo-001A description QtyOrd 20 some other description'
text4 = 'PartNo-001A description QtyOrd 20'
text5 = 'PartNo-001A QtyOrd 20'
text6 = 'PartNo-001A 20 units'
text7 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 units some other description PartNo-001A
'''
text8 = '''
Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd
20
'''
然后:
print(process_QtyOrd(text1))
print(process_QtyOrd(text2))
print(process_QtyOrd(text3))
print(process_QtyOrd(text4))
print(process_QtyOrd(text5))
print(process_QtyOrd(text6))
print(process_QtyOrd(text7))
print(process_QtyOrd(text8))
因为text8
代码不工作。你能帮我解决这个问题吗?
输出应该是这样的:
1. PartNo-001A description QtyOrd 20 Units some other description
2. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description
3. PartNo-001A description QtyOrd 20 Units some other description
4. PartNo-001A description QtyOrd 20 Units
5. PartNo-001A QtyOrd 20 Units
6. PartNo-001A QtyOrd 20 Units
7. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
QtyOrd 20 Units some other description PartNo-001A
8. Could you please redirect the ticket to the correct sales department so they can provide assistance and a quote for the items
below.
PartNo-001A
QtyOrd
20
units
解决方案
假设您对 text8 的预期结果是OrderQty 20 Units
,不是QtyOrd 20 units
,请您尝试:
def process_QtyOrd(text):
m = re.match(r'^(.*?)(qtyord\s*\d+\s*units|\d+\s+units|qtyord\s+\d+)(.*)$', text, flags=re.IGNORECASE|re.DOTALL)
if m:
str= re.sub(r'\D*(\d+)\D*', r'OrderQty \1 Units', m.group(2))
text = m.group(1) + str + m.group(3)
return text
- 将
re.match
输入text
分成三个子字符串:我们要修改的感兴趣的部分、前导子字符串和尾随子字符串。 - 我已经使用
\s
而不是 - 感兴趣的部分由 捕获
m.groups(2)
。然后我们可以用re.sub()
函数来修改它。 - 最后
text
是上面子串的串联。
[更新]
请您尝试以下方法:
def process_QtyOrd(text):
text = re.sub(r'(qtyord\s*\d+\s*units|\d+\s+units|qtyord\s+\d+)', lambda m: re.sub(r'\D*(\d+)\D*', r'QtyOrd \1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text
现在,如果文本包含多个模式,它将起作用。我已将替换文本中的单词OrderQty
更改为。QtyOrd
re.match()
我没有使用上一个答案 中的函数,而是使用re.sub()
函数一次替换文本中所有出现的模式。- 引入 lambda 函数以将替换部分作为表达式进行计算。
[更新2]
如果要包含十进制数字,请尝试以下操作:
def process_QtyOrd(text):
text = re.sub(r'(qtyord\s*\d+(?:\.\d+)?\s*units|\d+(?:\.\d+)?\s+units|qtyord\s+\d+(?:\.\d+)?)', lambda m: re.sub(r'\D*(\d+(?:\.\d+)?)\D*', r'QtyOrd \1 Units', m.group(1)), text, flags=re.IGNORECASE|re.DOTALL)
return text
这个概念是替换\d+
为\d+(?:\.\d+)?
匹配的数字,后跟可选的点和数字。
推荐阅读
- xcode8 - XCUI 测试因异步等待失败而失败:超过 30 秒的超时,未达到预期
- java - Azure - 尝试从 keyVault 获取密钥时 ApplicationTokenCredential 中出现 NoSuchMethodExist 错误
- variables - 在tensorflow中,如何初始化部分变量
- python - 定位图像时出现 Pyautogui 错误
- python - 在 ply LALR 解析器上需要帮助
- vba - vba将范围复制到新工作表中的单元格
- powershell - 来自 Active Directory 的关于密码到期日期的矛盾值
- java - 在 Java Calculator 程序中执行 do while 循环
- node.js - 当前的 URL 字符串解析器已弃用
- javascript - 字符串拆分数组中的不一致导致 IE 11