首页 > 解决方案 > 使用 Javascript 从 Notepad++ 中的参考文件中将 [[Words]] 替换为其他 [[Words]]

问题描述

我有一个看起来像这样的翻译文件:

Apple=Apfel
Apple pie=Apfelkuchen
Banana=Banane
Bananaisland=Bananen Insel
Cherry=Kirsche
Train=Zug

...500 多行这样的行

现在我有一个需要处理文本的文件。只需要替换文本的某些部分,例如:

The [[Apple]] was next to the [[Banana]]. Meanwhile the [[Cherry]] was chilling by the [[Train]]. 
The [[Apple pie]] tastes great on the [[Bananaisland]].

结果需要

The [[Apfel]] was next to the [[Banane]]. Meanwhile the [[Kirsche]] was chilling by the [[Zug]].
The [[Apfelkuchen]] tastes great on the [[Bananen Insel]].

手动复制/粘贴的事件太多了。如上所述,搜索 [[XXX]] 并从另一个文件替换的简单方法是什么?

我试图为此寻求帮助很多小时,但无济于事。我得到的最接近的是这个脚本:

import re
separators = "=", "\n"

def custom_split(sepr_list, str_to_split):
    # create regular expression dynamically
    regular_exp = '|'.join(map(re.escape, sepr_list))
    return re.split(regular_exp, str_to_split)

with open('D:/_working/paired-search-replace.txt') as f:
    for l in f:
        s = custom_split(separators, l)
        editor.replace(s[0], s[1])

但是,这样会替换太多,或者不一致。例如 [[Apple]] 被 [[Apfel]] 正确替换,但 [[File:Apple.png]] 被错误地替换为 [[File:Apfel.png]] 并且 [[Apple pie]] 被 [[ Apfel pie]],所以我尝试连续数小时调整正则表达式无济于事。有没有人有任何信息 - 请用非常简单的术语 - 我如何解决这个问题/实现我的目标?

标签: pythonregexreplacenotepad++regexp-replace

解决方案


这有点棘手,因为 [ 是正则表达式中的元字符。

我确信有一种更有效的方法可以做到这一点,但这很有效:

replaces="""Apple=Apfel
Apple pie=Apfelkuchen
Banana=Banane
Bananaisland=Bananen Insel
Cherry=Kirsche
Train=Zug"""


text = """
The [[Apple]] was next to the [[Banana]]. Meanwhile the [[Cherry]] was chilling by the [[Train]]. 
The [[Apple pie]] tastes great on the [[Bananaisland]].
"""

if __name__ == '__main__':
    import re
    for replace in replaces.split('\n'):
        english, german = replace.split('=')
        text = re.sub(rf'\[\[{english}\]\]', f'[[{german}]]', text)

    print(text)

输出:

The [[Apfel]] was next to the [[Banane]]. Meanwhile the [[Kirsche]] was chilling by the [[Zug]]. 
The [[Apfelkuchen]] tastes great on the [[Bananen Insel]].

推荐阅读