首页 > 解决方案 > 解析此字符串以获取 bash 中的特定信息

问题描述

我正在编写一个 bash 脚本,其中一部分需要我通过解析字符串以获取有用信息并丢弃其余部分来从字符串中检索作业的标题和截止日期。

我已经尝试使用 sed 了一点,但似乎无法让它按我想要的方式工作。

我尝试过的脚本中的一件事是 LABS=$(sed 's/<a.*/a>//' $LABS)

["<a href=\"https://classroom.github.com/a/WOWerwCz\">lab01</a>",
    "Lab 1", bblearn_content_base + "/resources/Labs/1.html",
    "7/3/2019", 1,
    "<a href=\"https://classroom.github.com/a/k3dVwTMy\">lab02</a>",
    "Lab 2", bblearn_content_base + "/resources/Labs/2.html",
    "7/12/2019", 1,
    "<a href=\"https://classroom.github.com/a/z1chUDd4\">lab03</a>",
    "Lab 3", bblearn_content_base + "/resources/Labs/3.html",
    "7/20/2019", 1,
    "<a href=\"https://classroom.github.com/a/iHbdXqs4\">lab04</a>",
    "Lab 4", bblearn_content_base + "/resources/Labs/4.html",
    "7/31/2019", 1,
    "<a href=\"https://classroom.github.com/a/WgyMWn68\">lab05</a>",
    "Lab 5", bblearn_content_base + "/resources/Labs/5.html",
    "8/5/2019", 1,
    "<a href=\"https://classroom.github.com/a/4anRjuDB\">lab06</a>",
    "Lab 6", bblearn_content_base + "/resources/Labs/6.html",
    "8/10/2019", 1,
    "<a href=\"https://classroom.github.com/a/qTyBR1R8\">lab07</a>",
    "Lab 7", bblearn_content_base + "/resources/Labs/7.html",
    "8/16/2019", 1,
    "<a href=\"https://classroom.github.com/a/UIJsxfA5\">lab08</a>",
    "Lab 8", bblearn_content_base + "/resources/Labs/8.html",
    "8/22/2019", 1,
    "<a href=\"https://classroom.github.com/a/XYUPMOiC\">lab09</a>",
    "Lab 9", bblearn_content_base + "/resources/Labs/9.html",
    "8/25/2019", 1,
    "<a href=\"https://classroom.github.com/a/ZJQ70-dy\">lab10</a>",
    "Lab 10", bblearn_content_base + "/resources/Labs/10.html",
    "9/1/2019", 1
]

基本上我需要做的是删除所有不是“Lab x”和随后日期的东西,但我正在努力弄清楚如何。

标签: bashparsingawkgrep

解决方案


我假设您可以在一个awk脚本中编写所有处理任务。但这里有一个awk完成任务的小脚本:

script.awk

BEGIN {FPAT="\"[^\"]+";}  # extract the first field between " into variable $1
NR % 3 == 0 { # on each 3rd line
    print labName, substr($1,2); # print labName and $1 without initial "
}
{             # on each line 
    labName = substr($1,2); # read the labName (or whatever there) from $1 (without initial ")
}

跑步:

awk -f script.awk input.txt

或一个班轮:

awk 'BEGIN {FPAT="\"[^\"]+";}NR % 3 == 0 {print labName, substr($1,2);}{labName = substr($1,2);}' input.txt

输出:

Lab 1 7/3/2019
Lab 2 7/12/2019
Lab 3 7/20/2019
Lab 4 7/31/2019
Lab 5 8/5/2019
Lab 6 8/10/2019
Lab 7 8/16/2019
Lab 8 8/22/2019
Lab 9 8/25/2019
Lab 10 9/1/2019

推荐阅读