首页 > 解决方案 > 审查文件

问题描述

我想使用 awk 来修改文本文件。修改后的文本文件应将任何以“te”或“Te”开头的单词转换为“yyyyy”中不包含数字的单词——以对文件进行排序。

所以例如一个文件

Hello everyone,
today is a great day to get tested by mr. Tenet here!
Don't te11 anyone!

应该修改成

Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!

然后我想包含有关修改的信息 - 说明文件有多少行以及修改了多少行(是否需要使用 for 循环来执行此操作?)

此信息应添加到文件末尾,如下所示:

The file has 3 lines and 2 out of these were modified.

我很迷茫,不胜感激。谢谢

标签: awkcycle

解决方案


awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/te[a-zA-Z]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

或者

awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/te[[:alhpa:]]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

如果您不需要更改文本的输出,print则从第二行中删除。

输出:

Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!
The file has 3 lines and 1 out of these were modified, with 2 changes

编辑:由于对我的评论,Teheran!我将输入文件更改为:

Hello everyone,
today is a great day, to get tested by mr. Tenet here!
time to light some external fire in Teheran!
Don't te11 anyone!

和脚本:

awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/\<te[[:alpha:]^[0-9][:punct:]]*/,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

这似乎工作正常:

Hello everyone,
today is a great day, to get yyyyy  by mr. yyyyy  here!
time to light some external fire in yyyyy
Don't te11 11 anyone!
The file has 4 lines and 3 out of these were modified, with 4 changes

推荐阅读