awk - Awk regex substring in column
问题描述
I have a data file with comma-separated fields:
379565,COFFEE,297678, ,21,21,I, 6, 10.00, , , ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD#05260540 ,YES ,N,N,20210625,
380685,COMICS,297634, ,21,21,I, 3, 21.00,MAIN NEWS , ,BATHS ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD# IS 05240526 ,YES ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6, 21.00, , ,SCHOOL PAGE ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT , ,CMYK ,N,N,20210625
When column 4 only has spaces, the 8-digit ad number needs to be pulled from column 15.
This awk checks to see if column 4 is only spaces and, if so, copies column 15 to 4:
awk -F, '{ if ($4 ~ /^[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/) {OFS=",";{$4=$15} print} else print}'
How can I extract just the 8-digit ad number (without the "AD#" or "AD# IS" parts) from column 15 and put into column 4?
Expected outcome:
379565,COFFEE,297678,05260540,21,21,I, 6, 10.00, , , ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD#05260540 ,YES ,N,N,20210625,
380685,COMICS,297634,05240526,21,21,I, 3, 21.00,MAIN NEWS , ,BATHS ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD# IS 05240526 ,YES ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6, 21.00, , ,SCHOOL PAGE ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT , ,CMYK ,N,N,20210625
解决方案
You may use this awk
:
awk 'BEGIN{FS=OFS=","} $4 ~ /^[[:blank:]]*$/ {$4 = $15; gsub(/[^[:digit:]]+/, "", $4)} 1' file
379565,COFFEE,297678,05260540,21,21,I, 6, 10.00, , , ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD#05260540 ,YES ,N,N,20210625,
380685,COMICS,297634,05240526,21,21,I, 3, 21.00,MAIN NEWS , ,BATHS ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT ,AD# IS 05240526 ,YES ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6, 21.00, , ,SCHOOL PAGE ,01-DISPLAY REVENUE ,17-HOUSE ACCOUNT , ,CMYK ,N,N,20210625
An expanded form:
awk '
BEGIN {FS=OFS=","}
$4 ~ /^[[:blank:]]*$/ {
$4 = $15
gsub(/[^[:digit:]]+/, "", $4)
}
1' file
推荐阅读
- c# - Linq Groupby在c#中没有分组
- git - 如何获取项目内所有 git 存储库头的提交 ID?
- java - 我怎样才能使我的应用程序 MS EDGE 兼容
- docusignapi - Docusign Embedded Signing , 发件人如何与多个签名者共享“收件人签名 URL”?
- java - 春季调度程序未启动
- tensorflow - Tensorflow 2.3.0 要求是 protobuf 3.8.0 或更高版本,但 tensorflow 仅在 protobuf 3.6.0 导入(对于 python 3.6.0,windows)
- reactjs - 有条件地使用反应进口
- google-cloud-platform - 无法将磁盘附加到实例
- excel - 使用 VBA 创建具有多个系列的 xy 散点图
- python - 如何使用 Pandas 替换相同的数据字符串(但键入不同)并将次要数字转换为主要字符串