首页 > 解决方案 > 如何在制表符分隔文件 (txt) 中替换后跟逗号 (,) 或点 (.) 的文本?

问题描述

我是自动热键的新手。我有一个脚本可以帮助我缩短那些我不需要的单词,并且在尝试替换后跟逗号或点的文本时遇到问题,这是我的脚本:

#NoEnv
#SingleInstance force
SetWorkingDir, %A_ScriptDir%
SendMode, Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
NonCapitalized := "a|an|in|is|of|the|this|with" ; List of words that         shouldn't be capitalized, separated by pipes
ReplacementsFile := "replacements.txt" ; Path to replacements file (tab     delimited file with 2 columns, UTF-8-BOM, CR+LF)

Send, ^a ; Selects all text
Gosub, SelectToClip ; Copies the selected text to the clipboard
FileRead, Replacements, % ReplacementsFile ; Reads the replacements file
If ErrorLevel ; Error message if file is not found
{
MsgBox, % "File not found: " ReplacementsFile
Return
}

StringUpper, Clipboard, Clipboard, T ; Whole clipboard to title case
Clipboard := RegExReplace(Clipboard, "i)(?<![!?.]) \b(" NonCapitalized ")\b",     " $L1") ; Changes to lowercase all words from the list "NonCapitalized", except     those preceded by new line/period/exclamation mark/question mark
pos := 0
While pos := RegExMatch(Replacements, "m`a)^([^\t]+)\t(.*)$", FoundReplace,     pos + 1) ; Gets all replacements from the tab delimited file
Clipboard := RegExReplace(Clipboard, "i)\b" FoundReplace1 "\b",     FoundReplace2) ; Replaces all occurrences in the clipboard

; add exceptions
Clipboard := StrReplace(Clipboard, "Vice President,", "")
Clipboard := StrReplace(Clipboard, "Director,", "")
Clipboard := StrReplace(Clipboard, "Senior Vice President,", "")

; = End of exceptions

Clipboard := RegExReplace(Clipboard, "^\s+|\s+(?=([\s,;:.]))|\s$") ; Removes     extra spaces
Send, ^v ; Pastes the clipboard
Return

SelectToClip:
Clipboard := ""
Send, ^c
ClipWait, 0
If ErrorLevel
Exit
Sleep, 50
Return

这是我的替换文件的一部分:

Chief Operating, Financial Officer  CFO & COO
Head,
President,

我的问题是如何在制表符分隔文件中添加后跟逗号(,)或点(。)的文本,而不是在 AHK 文件中添加更多行?因为如您所知,它不理解逗号和点作为文本。

非常感谢您的时间和帮助!

标签: autohotkey

解决方案


  1. 请缩进,否则您的代码将更难阅读

  2. 在正则表达式中,\b断言需要一个单词字符和一个非单词字符的序列,这使您的代码无法处理以逗号或点开头的非单词字符的字符串。

    ...\b 和 \B,因为它们是根据 \w 和 \W 定义的。
    ...
    单词边界是主题字符串中当前字符和前一个字符不匹配 \w 或 \W 的位置(即一个匹配 \w 而另一个匹配 \W),或开头或结尾如果第一个或最后一个字符分别与 \w 匹配,则为字符串。

以下测试工作:

#NoEnv
#SingleInstance force
SetWorkingDir %A_ScriptDir%
SendMode Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
FunctionNameOfYourChoice() {
    ; Using static vars allows you to avoid reading the file over and over on each key press.
    Static NonCapitalized   := "a|an|in|is|of|the|this|with" ; List of words that shouldn't be capitalized, separated by pipes
         , ReplacementsFile := "replacements.txt" ; Path to replacements file (tab delimited file with 2 columns, UTF-8-BOM, CR+LF)
         , Replacements     := ReadReplacements(ReplacementsFile)

    Send ^a ; Selects all text
    SelectToClip() ; Copies the selected text to the clipboard
    If ErrorLevel { ; Error message if file is not found
        MsgBox % "File not found: " ReplacementsFile
        Return
    }

    ; 3. StringUpper is deprecated in v2.
    ; 4. Better to work on a plain variable than on the clipboard in terms of performance and reliability.
    cbCnt := Format("{:T}", Clipboard)   ; Whole clipboard to title case
    ; Changes to lowercase all words from the list "NonCapitalized", except those preceded by new line/period/exclamation mark/question mark
    cbCnt := RegExReplace(cbCnt, "i)(?<![!?.]) \b(" NonCapitalized ")\b", " $L1")
    ; Goes through each pair of search and replacement strings
    Loop Parse, Replacements, `n, `r
        FoundReplace := StrSplit(A_LoopField, "`t")
        ; Replaces all occurrences in the clipboard
        , cbCnt := RegExReplace(cbCnt, "i)(?<!\w)\Q" FoundReplace.1 "\E(?!\w)", FoundReplace.2)   ; 5.
    cbCnt := RegExReplace(cbCnt, "(?<=\w-)([a-z])", "$U1")   ; 6.
/*
    ; Now the following can be included in the replacements.txt file.
    cbCnt := StrReplace(cbCnt, "Vice President,")
    cbCnt := StrReplace(cbCnt, "Director,")
    cbCnt := StrReplace(cbCnt, "Senior Vice President,")
*/
    ; Removes extra spaces
    ; This also removes all newlines. Are you sure you want to do this?
    Clipboard := RegExReplace(cbCnt, "^\s+|\s+(?=([\s,;:.]))|\s$")
    Send ^v ; Pastes the clipboard
}

SelectToClip() {
    Clipboard := ""
    Send ^c
    ClipWait 0.5   ; Specifying 0 wouldn't be a very good idea.
    If ErrorLevel
        Exit
    Sleep 50
}

ReadReplacements(path) {
    FileRead, Replacements, % path
    Return Replacements
}


编辑

  1. 是的,第二个正则表达式(其中的第一个断言)中有一个错字,已更正。“and”的问题不再赘述。

  2. 我添加了另一个RegExReplace作为解决您描述的连字符问题的不那么优雅的临时措施,但请注意,这本质上是一个不平凡的问题,因为这些问题的大写取决于语义。


推荐阅读