awk - 审查文件
问题描述
我想使用 awk 来修改文本文件。修改后的文本文件应将任何以“te”或“Te”开头的单词转换为“yyyyy”中不包含数字的单词——以对文件进行排序。
所以例如一个文件
Hello everyone,
today is a great day to get tested by mr. Tenet here!
Don't te11 anyone!
应该修改成
Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!
然后我想包含有关修改的信息 - 说明文件有多少行以及修改了多少行(是否需要使用 for 循环来执行此操作?)
此信息应添加到文件末尾,如下所示:
The file has 3 lines and 2 out of these were modified.
我很迷茫,不胜感激。谢谢
解决方案
awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
{ x=gsub(/te[a-zA-Z]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile
或者
awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
{ x=gsub(/te[[:alhpa:]]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile
如果您不需要更改文本的输出,print
则从第二行中删除。
输出:
Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!
The file has 3 lines and 1 out of these were modified, with 2 changes
编辑:由于对我的评论,Teheran!
我将输入文件更改为:
Hello everyone,
today is a great day, to get tested by mr. Tenet here!
time to light some external fire in Teheran!
Don't te11 anyone!
和脚本:
awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
{ x=gsub(/\<te[[:alpha:]^[0-9][:punct:]]*/,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile
这似乎工作正常:
Hello everyone,
today is a great day, to get yyyyy by mr. yyyyy here!
time to light some external fire in yyyyy
Don't te11 11 anyone!
The file has 4 lines and 3 out of these were modified, with 4 changes
推荐阅读
- java - Project Reactor 和 Java 内存模型
- django - 无法启动 docker 项目
- sql-server - sys.fn_get_audit_file 一次可以读取多少个sql审计文件?
- azure-cosmosdb - 架构编辑器集合采样缺少字段
- terminal - 让 WSL shell 在 Windows Visual Studio Code 中打开项目目录
- dart - Flutter 溢出:隐藏的类比
- python - 绘制顺序三角形的谢尔宾斯基递归问题
- firebase - Outlook 认为 Firebase 密码重置链接是一个安全问题
- php - 如何使用 PHP 版本的 Google Sheet API v4 合并单元格
- python - 编辑后的操纵杆代码未接收输入