bash - 使用 awk 如何在一个命令中提取匹配的字符串和其他数据
问题描述
我正在尝试为某个字符串模式(例如 new File())解析路径中的多个文件,这可能发生在该文件的多行中。
我试图返回的信息是;
1 文件名/路径
2 文件中字符串模式的出现次数
3 找到的代码,即 new File()
4 找到的行号代码
这是 test.txt 的示例文件内容;
new
File()
new File()
new
File()
Fil[![1]][1]e() new
new File() test new File()
一个更现实的现实世界的例子是(组成代码,不可编译);
package gw.plugin.document.impl
@Export
abstract class BaseLocalDocumentContentSource implements
InitializablePlugin
{
private static var DOCUMENTS_PATH = "documents.path"
public property get DemoDocumentsURL() : URL {
return new URL("file", "", DemoDocumentsPath)
}
construct() {
}
protected function buildDocumentsPath(documentRootDir : String,
documentTmpDir : String) {
if (DocumentsPathParameter.HasContent) {
DemoDocumentsPath = getAbsolutePath(DocumentsPathParameter,
documentRootDir)
if (!new test
File(DemoDocumentsPath).equals(new File(DocumentsPathParameter))) {
Logger.DOCUMENT.warn((typeof this).RelativeName + " has a
relative path specified for its documents.path parameter, so it will store
documents in the app container's temporary directory. For production use,
the configuration should be changed to a full directory path, not a
relative path")
DocumentsPath = getAbsolutePath(DocumentsPathParameter, documentTmpDir)
var file = new File(DocumentsPath)
if (!file.exists() && file.isDirectory()) {
file.mkdirs()
}
} else {
DocumentsPath = DemoDocumentsPath
}
}
Logger.DOCUMENT.info("Documents path: " + DocumentsPath)
}
protected function updateDocument(strDocUID : String, isDocument : InputStream) {
try {
var file = getDocumentFile(strDocUID)
if (!FileUtil.isFile( file ) || file.isReservedFileName()) {
throw new IllegalArgumentException("Document ${strDocUID} does not exist!")
}
var backupFile = new File(file.getPath() + ".bak")
if (not file.renameTo(backupFile) ) { // renamed physical file, 'file' still has previous name
throw new RuntimeException("Failed to rename file to ${backupFile}")
}
copyToFile(isDocument, file)
try {
backupFile.delete()
}
catch (e : Throwable) {
Logger.DOCUMENT.warn("DocMgmt failed to delete '${backupFile}'")
}
} catch (e : Exception) {
throw new RuntimeException("Exception encountered trying to update document with doc UID: ${strDocUID}", e)
}
}
protected function getDocumentFile(relativePath : String, checkDemoFolder : boolean) : File {
var file = new File(getDocumentsDir(), relativePath)
if (!file.exists() && checkDemoFolder) {
file = new File(getDemoDocumentsDir(), relativePath)
}
return file
}
protected function makeSubDirPath(diw : IDocumentInfoWrapper) : String {
var subDirPath = diw.getSubDirForDocument()
var dirDoc = new File(getDocumentsDir() + subDirPath)
if (not dirDoc.Directory) {
dirDoc.mkdirs()
}
return subDirPath
}
private static function getAbsolutePath(path : String, rootPath : String) : String {
var retVal = path
if (path.startsWith("\\") || path.startsWith("/") || (path.length() > 1 && path.charAt(1) == ":" as char)) {
retVal = path
} else {
retVal = rootPath + File.separator + path
}
try {
retVal = (new File(retVal)).getCanonicalPath()
} catch (e : IOException) {
throw new RuntimeException("Could not get absolute path from relative path: ${path}", e)
}
return retVal.replaceAll("\\\\","/")
}
}
我看过 grep、pcregrep、sed 和 awk。我正在搜索的文件夹非常大,因此我尝试在一个命令中返回所需的所有数据,而不是运行四个命令并且必须多次遍历该文件夹。
我发现 awk 是最适用的,但在我提到的所有程序中经验非常有限,而且我没有在 env 中安装 pcregrep 的授权,所以不能使用它。
到目前为止,这是我对 awk 的尝试,这是错误的,可能做得不好,所以要温柔:)
awk '{
if(/new[[:space:]]*/) {
line1=NR;
code1=$0;
} if(/File\(\)/) {
count[$0]++;
line2=NR;
if(line1 != line2) {
code2=$0;
printf "Found on lines %d, %d, code = %s %s \nNumber of occurrences = %d", line1, line2, code1, code2, count[$0]
} else {
printf "Found on line %d, code = %s \nNumber of occurrences = %d", line1, code1, count[$0]
}
}
}' test.txt
我知道我的出现次数不正确,因为我计算的是每场比赛的出现次数,而不是文件中的总数。我得到了一些奇怪的输出,如下所示;
File()n lines 1, 2, code = new
Number of occurrences = 1
ound on line 3, code = new File()
Number of occurrences = 1
File()n lines 4, 8, code = new
Number of occurrences = 2
ound on line 9, code = File() new
Number of occurrences = 1
其中 code2 覆盖了 print 语句的前几个单词,而不是在我期望的地方打印。
预期输出将类似于;
test.txt (Filename)
5 (number of occurrences of new File() pattern)
new File() Found on lines 1 & 2
new File() Found on line 3
new File() Found on lines 4 & 9
new File() Found on line 10
new File() Found on line 10
或类似的东西
cat -vte test.txt 的输出是;
new^M$
File()^M$
new File()^M$
new ^M$
^M$
^M$
^M$
File()^M$
File() new^M$
new File() test new File()
任何帮助,将不胜感激。
解决方案
你可以使用这个awk
:
awk -v msg='new File() Found on line ' 'BEGIN {print ARGV[1], "(Filename)"} {while(match($0, /new[[:blank:]]+File\(\)/)) {print msg NR; ++n; $0 = substr($0, RSTART+RLENGTH)}} /new[[:blank:]]*$/ {p = NR; next} p && NF {if (/^[[:blank:]]*File\(\)/) {print msg p, "&", NR; ++n} p = 0} END {print n, "(number of occurrences of new File() pattern)"}' test.txt
test.txt (Filename)
new File() Found on line 1 & 2
new File() Found on line 3
new File() Found on line 4 & 8
new File() Found on line 10
new File() Found on line 10
5 (number of occurrences of new File() pattern)
更易读的形式:
awk -v msg='new File() Found on line ' '
BEGIN {print ARGV[1], "(Filename)"}
{
while(match($0, /new[[:blank:]]+File\(\)/)) {
print msg NR
++n
$0 = substr($0, RSTART+RLENGTH)
}
}
/new[[:blank:]]*$/ {
p = NR
next
}
p && NF {
if (/^[[:blank:]]*File\(\)/) {
print msg p, "&", NR
++n
}
p = 0
}
END {
print n, "(number of occurrences of new File() pattern)"
}' test.txt
推荐阅读
- c++ - 具有固定大小数组成员的结构的 C++ 大括号初始化
- java - 使用 launch4j 将 .jar 转换为 .exe
- php - 将数据库查询结果转换为 excel 并允许从 ajax 调用下载
- java - 如何恢复损坏的excel文件
- javascript - 在点击旁边弹出具有绝对位置的div
- javascript - 如果文本是数字,如何将函数应用于所有标签内的文本?
- c# - Azure 函数可选“中间”路由参数
- dictionary - 我如何在java中阅读镶木地板字典
- asp.net-mvc - Syteline 8 中的自定义网页
- sql - 从打包的 .NET 工具运行请求时查询超时已过期