首页 > 解决方案 > 如何从 csv 文件中的第四列中删除第三列的值(如果存在)?

问题描述

Ubuntu 16.04 重击 4.3.48

我想从第 4 列中删除第 3 列的值,如果它存在,包括值后的空格。

Before: "Acura","CL","2.2","2.2 2dr Coupe","FWD","Automatic","Gasoline"
After:  "Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"

Before: "Acura","CL","2.2 Premium","2.2 Premium 2dr Coupe","FWD","Manual","Gasoline"
After:  "Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"   

我尝试按照建议使用 awk:

root@0000 ~ # awk 'BEGIN{FS=OFS=","} {sub($3,"",$4)} 1' data-one-makes-models.csv > temp; head -n5 temp
"make","model","trim","style","drivetrain","transmission","fueltype"
"Acura","CL","2.2","2.2 2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2","2.2 2dr Coupe","FWD","Manual","Gasoline"
"Acura","CL","2.2 Premium","2.2 Premium 2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium","2.2 Premium 2dr Coupe","FWD","Manual","Gasoline"  

我是正确重定向输出还是应该重组命令?

标签: bashawksed

解决方案


在您的代码中,您,用作分隔符,但您的字段实际上是由分隔的,","因此只需更改 FS 和 OFS 设置以匹配您的数据:

$ awk 'BEGIN{FS=OFS="\",\""} {sub($3,"",$4)} 1' file
"Acura","CL","2.2"," 2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium"," 2dr Coupe","FWD","Manual","Gasoline"

并摆脱 $4 开头留下的空间,包括正则表达式中的空格:

$ awk 'BEGIN{FS=OFS="\",\""} {sub($3" *","",$4)} 1' file
"Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"

虽然由于使用 $3 作为正则表达式,但它并不健壮,因此 RE 元字符之类的.将被视为:

$ echo '"Acura","CL","2.2","Big 12324 Coupe","FWD","Automatic","Gasoline"' |
    awk 'BEGIN{FS=OFS="\",\""} {sub($3,"",$4)} 1'
"Acura","CL","2.2","Big 14 Coupe","FWD","Automatic","Gasoline"

为此,您应该真正执行字符串而不是正则表达式操作:

$ awk 'BEGIN{FS=OFS="\",\""} s=index($4,$3){$4=substr($4,1,s-1) substr($4,s+length($3)); gsub(/ +/," ",$4); gsub(/^ | $/,"",$4)} 1' file
"Acura","CL","2.2","2dr Coupe","FWD","Automatic","Gasoline"
"Acura","CL","2.2 Premium","2dr Coupe","FWD","Manual","Gasoline"

如果您只想删除 $3 (如果它出现在 $4 的开头),那么只需更改s=index($4,$3)(s=index($4,$3))==1.


推荐阅读