首页 > 解决方案 > 用于匹配子字符串但不包含单词的正则表达式(单词边界问题)

问题描述

我有 100,000 个文件(主要是办公室类型的文件)。我正在使用 Excel VBA 检查所有包含“列表”一词的文件名,但试图避免误报(例如“专家”)。

为“匹配子字符串的正则表达式,但不包含单词”提供的答案非常接近所需的 ( \b(?!String)\w*ring\w*\b),除了我的文件名没有整齐的单词边界。

当前模式\b(?!specialist)\w*list\w*\b正确地忽略了一些变体(3 Specialist,6-specialistSpecialists)。是否可以修改模式以使其也正确清除以下变体:1Specialist如果是这样2_specialistXspecialists? 有人可以指点我正确的方向吗?

非常感谢您的任何帮助/建议,M

这是我一直在使用的递归子例程(对格式不佳表示歉意):

Sub RecursiveFolderPATTERN(objFolder As Scripting.Folder, _IncludeSubfolders As Boolean)

'Declare the variables
Dim objFile As Object
Dim objSubFolder As Scripting.Folder
Dim NextRow As Long

Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)"
objRegExp.IgnoreCase = True

'Find the next available row
NextRow = Cells(Rows.Count, "A").End(xlUp).Row + 1

'Loop through each file in the folder
For Each objFile In objFolder.Files
If objRegExp.test(objFile) Then
Cells(NextRow, "A").Value = objFile.Name
Cells(NextRow, "E").Value = objFile.Size
Cells(NextRow, "F").Value = objFile.Type
Cells(NextRow, "G").Value = objFile.DateCreated
Cells(NextRow, "H").Value = objFile.DateLastAccessed
Cells(NextRow, "I").Value = objFile.DateLastModified
Cells(NextRow, "J").Value = objFile.Path
NextRow = NextRow + 1
End If
Next objFile

'Loop through files in the subfolders
If IncludeSubfolders Then
For Each objSubFolder In objFolder.Subfolders
    Call RecursiveFolderPATTERN(objSubFolder, True)
Next objSubFolder
End If

End Sub

答案编辑:将行更改If objRegExp.test(objFile) ThenIf objRegExp.test(objFile.Name) Then修复了问题。

替代答案编辑:将模式从更改"([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)""(^(?!.*specialist).*list.*$)"也效果很好。这两种方法都有其优点,所以我打算同时使用它们。

标签: regex

解决方案


如果您的目标是查找与“list”匹配但不匹配“specialist”的文件名,请尝试以下正则表达式

(?i)^(?!.*specialist).*list.*$

编辑

从模式中删除(?i)并使用以下代码段对其进行测试:

Sub RecursiveFolderPATTERN()
  Dim objRegExp As Object, arrStrings() As String, _
      i As Long, objMatch As Object
  Set objRegExp = CreateObject("VBScript.RegExp")
  With objRegExp
    .Global = True
    .IgnoreCase = True
    .MultiLine = False
    .Pattern = "^(?!.*specialist).*list.*$"
  End With
  Dim TestString As String
  TestString = "3 Specialist" & vbNewLine & _
               "6-specialist" & vbNewLine & _
               "Specialists" & vbNewLine & _
               "true SpeciaList" & vbNewLine & _
               "1 Specialist" & vbNewLine & _
               "2_specialist" & vbNewLine & _
               "Xspecialists" & vbNewLine & _
               "TheListOfSpecialists.xlsx" & vbNewLine & _
               "List" & vbNewLine & _
               "lISTs" & vbNewLine & _
               "Globalistics" & vbNewLine & _
               "GlobalList.doc" & vbNewLine & _
               "fatalistic" & vbNewLine & _
               "The big list of PII.csv" & vbNewLine & _
               "A few lISTs with something.xls"
  arrStrings = Split(TestString, vbNewLine)
  For i = LBound(arrStrings) To UBound(arrStrings)
    If objRegExp.Test(arrStrings(i)) Then
      Debug.Print arrStrings(i)
    End If
  Next
End Sub

推荐阅读