首页 > 解决方案 > 使用 awk 如何在一个命令中提取匹配的字符串和其他数据

问题描述

我正在尝试为某个字符串模式(例如 new File())解析路径中的多个文件,这可能发生在该文件的多行中。

我试图返回的信息是;

1 文件名/路径
2 文件中字符串模式的出现次数
3 找到的代码,即 new File()
4 找到的行号代码

这是 test.txt 的示例文件内容;

    new
    File()
    new File()
    new
    
    
    
    File()
    Fil[![1]][1]e() new
    new File() test new File()

这是记事本++中文件的图片 测试.txt

一个更现实的现实世界的例子是(组成代码,不可编译);

package gw.plugin.document.impl


@Export
abstract class BaseLocalDocumentContentSource implements 
InitializablePlugin
{

  private static var DOCUMENTS_PATH = "documents.path"

  public property get DemoDocumentsURL() : URL {
    return new URL("file", "", DemoDocumentsPath)
  }


  construct() {
  }

  protected function buildDocumentsPath(documentRootDir : String, 
  documentTmpDir : String) {
    if (DocumentsPathParameter.HasContent) {
      DemoDocumentsPath = getAbsolutePath(DocumentsPathParameter, 
documentRootDir)
      if (!new test 
      File(DemoDocumentsPath).equals(new File(DocumentsPathParameter))) {
          Logger.DOCUMENT.warn((typeof this).RelativeName + " has a 
relative path specified for its documents.path parameter, so it will store 
documents in the app container's temporary directory. For production use, 
the configuration should be changed to a full directory path, not a 
relative path")
          DocumentsPath = getAbsolutePath(DocumentsPathParameter, documentTmpDir)
      var file = new File(DocumentsPath)
      if (!file.exists() && file.isDirectory()) {
          file.mkdirs()
      }
  } else {
      DocumentsPath = DemoDocumentsPath
  }
}
Logger.DOCUMENT.info("Documents path: " + DocumentsPath)
  }

  protected function updateDocument(strDocUID : String, isDocument : InputStream) {
try {
    var file = getDocumentFile(strDocUID)
    if (!FileUtil.isFile( file ) || file.isReservedFileName()) {
        throw new IllegalArgumentException("Document ${strDocUID} does not exist!")
    }
    var backupFile = new File(file.getPath() + ".bak")
    if (not file.renameTo(backupFile) ) { // renamed physical file, 'file' still has previous name
      throw new RuntimeException("Failed to rename file to ${backupFile}")
        }
    copyToFile(isDocument, file)
    try {
      backupFile.delete()
    }
    catch (e : Throwable) {
      Logger.DOCUMENT.warn("DocMgmt failed to delete '${backupFile}'")
    }
} catch (e : Exception) {
    throw new RuntimeException("Exception encountered trying to update document with doc UID: ${strDocUID}", e)
}
  }

protected function getDocumentFile(relativePath : String, checkDemoFolder : boolean) : File {
var file = new File(getDocumentsDir(), relativePath)
if (!file.exists() && checkDemoFolder) {
    file = new File(getDemoDocumentsDir(), relativePath)
}
return file
}

  protected function makeSubDirPath(diw : IDocumentInfoWrapper) : String {
  var subDirPath = diw.getSubDirForDocument()
  var dirDoc = new File(getDocumentsDir() + subDirPath)
  if (not dirDoc.Directory) {
      dirDoc.mkdirs()
  }
  return subDirPath
  }


 private static function getAbsolutePath(path : String, rootPath : String) : String {
    var retVal = path
    if (path.startsWith("\\") || path.startsWith("/") || (path.length() > 1 && path.charAt(1) == ":" as char)) {
    retVal = path
    } else {
    retVal = rootPath + File.separator + path
    }
    try {
    retVal = (new File(retVal)).getCanonicalPath()
    } catch (e : IOException) {
    throw new RuntimeException("Could not get absolute path from relative path: ${path}", e)
    }
    return retVal.replaceAll("\\\\","/")
  }

}

我看过 grep、pcregrep、sed 和 awk。我正在搜索的文件夹非常大,因此我尝试在一个命令中返回所需的所有数据,而不是运行四个命令并且必须多次遍历该文件夹。

我发现 awk 是最适用的,但在我提到的所有程序中经验非常有限,而且我没有在 env 中安装 pcregrep 的授权,所以不能使用它。

到目前为止,这是我对 awk 的尝试,这是错误的,可能做得不好,所以要温柔:)

    awk '{
       if(/new[[:space:]]*/) {
         line1=NR;
         code1=$0;
       } if(/File\(\)/) { 
         count[$0]++; 
         line2=NR; 
         if(line1 != line2) {
           code2=$0;
           printf "Found on lines %d, %d, code = %s %s \nNumber of occurrences = %d", line1, line2, code1, code2, count[$0]
         } else { 
           printf "Found on line %d, code = %s \nNumber of occurrences = %d", line1, code1, count[$0]
         } 
       }
    }' test.txt 

我知道我的出现次数不正确,因为我计算的是每场比赛的出现次数,而不是文件中的总数。我得到了一些奇怪的输出,如下所示;

     File()n lines 1, 2, code = new
     Number of occurrences = 1
     ound on line 3, code = new File()
    Number of occurrences = 1
     File()n lines 4, 8, code = new
     Number of occurrences = 2
     ound on line 9, code = File() new
    Number of occurrences = 1

其中 code2 覆盖了 print 语句的前几个单词,而不是在我期望的地方打印。

预期输出将类似于;

    test.txt (Filename) 
    5 (number of occurrences of new File() pattern) 
    new File() Found on lines 1 & 2 
    new File() Found on line 3 
    new File() Found on lines 4 & 9 
    new File() Found on line 10 
    new File() Found on line 10 

或类似的东西

cat -vte test.txt 的输出是;

    new^M$
    File()^M$
    new File()^M$
    new ^M$
    ^M$
    ^M$
    ^M$
    File()^M$
    File() new^M$
    new File() test new File()

任何帮助,将不胜感激。

标签: bashawk

解决方案


你可以使用这个awk

awk -v msg='new File() Found on line ' 'BEGIN {print ARGV[1], "(Filename)"} {while(match($0, /new[[:blank:]]+File\(\)/)) {print msg NR; ++n; $0 = substr($0, RSTART+RLENGTH)}} /new[[:blank:]]*$/ {p = NR; next} p && NF {if (/^[[:blank:]]*File\(\)/) {print msg p, "&", NR; ++n} p = 0} END {print n, "(number of occurrences of new File() pattern)"}' test.txt

test.txt (Filename)
new File() Found on line 1 & 2
new File() Found on line 3
new File() Found on line 4 & 8
new File() Found on line 10
new File() Found on line 10
5 (number of occurrences of new File() pattern)

更易读的形式:

awk -v msg='new File() Found on line ' '
BEGIN {print ARGV[1], "(Filename)"}
{
   while(match($0, /new[[:blank:]]+File\(\)/)) {
      print msg NR
      ++n
      $0 = substr($0, RSTART+RLENGTH)
   }
}
/new[[:blank:]]*$/ {
   p = NR
   next
}
p && NF {
   if (/^[[:blank:]]*File\(\)/) {
      print msg p, "&", NR
      ++n
   }
   p = 0
}
END {
   print n, "(number of occurrences of new File() pattern)"
}' test.txt

推荐阅读