python - Python 多个子正则表达式
问题描述
最初有这样的工作脚本来检查文件夹中的 csv 文件并替换一个子字符串:
import fileinput
import os
import glob
#### Directory and file mask
this = r"C:\work\PythonScripts\Replacer\*.csv"
output_folder = "C:\\work\\PythonScripts\\Replacer\\"
#### Get files
files = glob.glob(this)
#### Section to replace
text_to_search = 'z'
replacement_text = 'ZZ_Top'
#### Loop through files and lines:
for f in files:
head, tail = os.path.split(f)
targetFileName = os.path.join(head, output_folder, tail)
with fileinput.FileInput(targetFileName, inplace=True, backup='.bak') as file:
for line in file:
print(line.replace(text_to_search, replacement_text), end='')
有必要替换几个 Word 引号和长连字符。所以我想在上面的循环中使用这样的东西:
s = '’ ‘ ’ ‘ ’ – “ ” “ – ’'
print(s)
print(s.replace('’', '\'').replace('‘', '\'').replace('–','-').replace('“','"').replace('”','"'))
==>
’ ‘ ’ ‘ ’ – “ ” “ – ’
' ' ' ' ' - " " " - '
但后来我遇到了以下使用正则表达式子函数的评论: https ://stackoverflow.com/a/765835
所以我尝试了它,它自己运行良好:
import re
def multisub(subs, subject):
# "Simultaneously perform all substitutions on the subject string."
pattern = '|'.join('(%s)' % re.escape(p) for p, s in subs)
substs = [s for p, s in subs]
replace = lambda m: substs[m.lastindex - 1]
return re.sub(pattern, replace, subject)
print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], '1’ 2‘ 1’ 2‘ 1’ 3– 4“ 5” 4“ 3– 2’'))
==>
1' 2' 1' 2' 1' 3- 4" 5" 4" 3- 2'
但是只要我将它粘贴到它运行但不修改文件的原始脚本:
import fileinput
import os
import glob
import re
#### Directory and file mask
this = r"C:\work\PythonScripts\Replacer\*.csv"
output_folder = "C:\\work\\PythonScripts\\Replacer\\"
#### RegEx substitution func
def multisub(subs, subject):
# "Simultaneously perform all substitutions on the subject string."
pattern = '|'.join('(%s)' % re.escape(p) for p, s in subs)
substs = [s for p, s in subs]
replace = lambda m: substs[m.lastindex - 1]
return re.sub(pattern, replace, subject)
#### Get files
files = glob.glob(this)
#### Loop through files and lines:
for f in files:
head, tail = os.path.split(f)
targetFileName = os.path.join(head, output_folder, tail)
with fileinput.FileInput(targetFileName, inplace=True, backup='.bak') as file:
for line in file:
print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')
这里有什么问题?
解决方案
当我测试它时,你的代码实际上对我有用,但是你有很多不必要的处理可能会引入错误。使用fileinput
over regular的最大优点open
是它可以循环遍历多个文件中的行,而无需另一个循环来单独打开每个文件。所以试试这个,看看它是否有效:
#### Get files
files = glob.glob(this)
#### Loop through files and lines:
for line in fileinput.input(files, inplace=True, backup='.bak'):
print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')
推荐阅读
- java - 操作的递归顺序
- python - 如何将所有黑色像素更改为白色(OpenCV)?
- java - 无法在android上下载图像并存储为位图
- python - 从网站生成和下载 tsv(使用 python)
- java - 如何修复 intellij 中阻止我运行项目的错误
- html - 属性:text-decoration-skip-ink 不存在:无
- php - 使用 php (curl) 以编程方式更新 github 代码(无库)
- tensorflow - 针对许多小型矩阵向量乘法优化 Tensorflow
- google-apps-script - Google表格:在选中复选框后防止取消选中该复选框
- html - 如何倾斜带有背景图像的div的底部边框