awk - 根据特定列中的数据输出文件
问题描述
使用输入文件,我想生成所需的输出。
我试图弄清楚如何使用下面显示的代码准确地获得所需的输出结果。
输入文件第 2 列中的信息需要在第21 到 80 列的输出文件中,填充所有此范围。
所需的输出:
X52152 1214-1216,1218-1221,1233,1222,1245,1223,1246,1249,1251,
X52152 1224-1232,1234-1243,1247,1250,1253-1254,1332,1331,1333-1336,
X52152 1338,1337,1339-1340,1467.
X52155 1215-1216,1218-1221,1233,1222,1245,1223,1246,1249,1251,1248,
X52155 1224-1232,1234-1243,1247,1250,1253-1254,1332,1331,1333-1336,
X52155 1338,1337,1339-1341.
这里使用的代码。
awk '
function range_to_out() {
out=(out sep (start == last ? start : (start "-" last)))
}
function print_out() {
printf "%s %s\n", p1, out","
}
NR == 1 { start=last=$2; p1=$1; next }
{
if ($2 == last+1) { last=$2 } else {
range_to_out(); sep=","; start=last=$2
}
}
$1 != p1 || length(out) > 50 { print_out(); sep=out=""; p1=$1 }
END { range_to_out(); print_out() }
' file
这个问题与之前的问题相似,我是否从格伦杰克曼先生那里得到了代码。这是他的代码。此代码与使用单列的其他输入文件完美配合。
awk '
function printrange() { print start (start == last ? "" : "-" last) }
NR == 1 {start=last=$1; next}
$1 == last+1 {last=$1; next}
{printrange(); start=last=$1}
END {printrange()}
' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /'
提前致谢。
输入文件
X52152 1214
X52152 1215
X52152 1216
X52152 1218
X52152 1219
X52152 1220
X52152 1221
X52152 1233
X52152 1222
X52152 1245
X52152 1223
X52152 1246
X52152 1249
X52152 1251
X52152 1224
X52152 1225
X52152 1226
X52152 1227
X52152 1228
X52152 1229
X52152 1230
X52152 1231
X52152 1232
X52152 1234
X52152 1235
X52152 1236
X52152 1237
X52152 1238
X52152 1239
X52152 1240
X52152 1241
X52152 1242
X52152 1243
X52152 1247
X52152 1250
X52152 1253
X52152 1254
X52152 1332
X52152 1331
X52152 1333
X52152 1334
X52152 1335
X52152 1336
X52152 1338
X52152 1337
X52152 1339
X52152 1340
X52152 1467
X52155 1215
X52155 1216
X52155 1218
X52155 1219
X52155 1220
X52155 1221
X52155 1233
X52155 1222
X52155 1245
X52155 1223
X52155 1246
X52155 1249
X52155 1251
X52155 1248
X52155 1224
X52155 1225
X52155 1226
X52155 1227
X52155 1228
X52155 1229
X52155 1230
X52155 1231
X52155 1232
X52155 1234
X52155 1235
X52155 1236
X52155 1237
X52155 1238
X52155 1239
X52155 1240
X52155 1241
X52155 1242
X52155 1243
X52155 1247
X52155 1250
X52155 1253
X52155 1254
X52155 1332
X52155 1331
X52155 1333
X52155 1334
X52155 1335
X52155 1336
X52155 1338
X52155 1337
X52155 1339
X52155 1340
X52155 1341
解决方案
您的awk
代码如下所示:
function print_stuff(label,string, t) {
# abuse $0 as it makes life easy
t = $0; $0 = string;
# replace values with "-" if a-1,a,a+1
for (i=2;i<NF;++i) {
if ($i == $(i-1)+1 && $i == $(i+1)-1) $i="-"
else if ($(i-1) == "-" && $i == $(i+1)-1) $i="-"
else if ($i == $(i-1)+1) $i="- "$i
}
# substitute all " - - - " with "-" and all " " with ","
gsub(/ [ -]+/,"-"); gsub(/ /,",")
# print columns
while (length($0)>=60) {
match(substr($0,1,60),/,[^,]*$/)
printf "%-20s", label; print substr($0,1,RSTART)
$0=substr($0,RSTART+1)
}
printf "%-20s", label; print $0"."
$0 = t;
}
{ gsub(/\r/,"",$0) } # get rid of the cariage return
(NR == 1) { a=$1; b = $2; next } # initialize
(a == $1) { b = b" "$2; next } # append values
(a != $1) { print_stuff(a,b); a = $1; b = $2 } # print
END { print_stuff(a,b) } # print last
这输出:
$ awk -f main.awk <file>
X52152 1214-1216,1218-1221,1233,1222,1245,1223,1246,1249,1251,
X52152 1224-1232,1234-1243,1247,1250,1253-1254,1332,1331,1333-1336,
X52152 1338,1337,1339-1340,1467.
X52155 1215-1216,1218-1221,1233,1222,1245,1223,1246,1249,1251,1248,
X52155 1224-1232,1234-1243,1247,1250,1253-1254,1332,1331,1333-1336,
X52155 1338,1337,1339-1341.
推荐阅读
- python - 如何在 Plotly (Python) 中自定义日期时间刻度
- chromium - 用VS2017在Windows10上单独搭建Chromium Network stack
- java - 没有使用 Selenium 加载网页
- r - 在 R 中绘制三次回归
- php - 在 WordPress / WooCommerce 上获取 Stripe 客户 ID
- javascript - 登录后注册路由 - 谷歌工作箱软件
- excel - 将手动 Excel 报告自动化到 PowerBI
- javascript - Vue Google Chrome 密码自动填充禁用
- oracle-cloud-infrastructure - 将数据从本地数据库流式传输到 Oracle OCI Obejctstorage
- python - 如何使用 CLR 和 pythonnet 将 python 列表作为函数参数传递给导入的 dotnet 函数?