html - 如何在两个字符串之间进行grep?
问题描述
我有一个文本文件:
<span class="html-tag"><script></span></td></tr><tr><td class="line-number" value="1431"></td><td class="line-content"> var awbManifests = {"requestId":"16d1-4451-9b12-f61a87e9cd11","errorMessage":null,"errorCode":null,"success":true,"content":[{"id":"5ec8-444e-9d5b-f7487ce592c2","storeId":"10001","createdDate":1541923869937,"createdBy":"asdf","updatedDate":1541968417296,"updatedBy":"dsa","type":"airwaybill","value":"5468468464568466","logisticTrackingID":"5468468464568466","senderName":"dasdf","senderAddress":"Batuceper","receiverName":"ATIK","receiverAddress":"JL. SRIKATON BARAT\n","manifestList":[{"logisticProviderCode":"asd","blibliAirwayBillNumber":"5468468464568466","status":"DEPARTED FROM TRANSIT [GATEWAY JAKARTA]","timestamp":1541976677000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"RECEIVED AT ORIGIN GATEWAY [GATEWAY JAKARTA]","timestamp":1541976343000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"PROCESSED AT SORTING CENTER [JAKARTA]","timestamp":1541968348000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"RECEIVED AT SORTING CENTER [JAKARTA]","timestamp":1541960930000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"SHIPMENT RECEIVED BY asdf COUNTER OFFICER AT [JAKARTA]","timestamp":1541926728000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]}]}],"pageMetaData":null};</td></tr><tr><td class="line-number" value="1432"></td><td class="line-content"> var ordersTracking = [{"orderItemId":"53000116530","product":null,"shipment":"asdf","airwaybillNumber":"5468468464568466","receiver":null,"receivedDate":null,"relation":null,"status":"Valid","productType":"Regular","eligibleForFeedback":false,"feedback":null,"invalidAWBJiraNumber":"","mismatchAWBJiraNumber":"","isAirwayBillValid":true,"mismatchAirwayBill":false}];
我想从var awbManifests =
第一个;
符号开始得到结果,所以输出应该只是这样的 JSON 格式:
{"requestId":"16d1-4451-9b12-f61a87e9cd11","errorMessage":null,"errorCode":null,"success":true,"content":[{"id":"5ec8-444e-9d5b-f7487ce592c2","storeId":"10001","createdDate":1541923869937,"createdBy":"asdf","updatedDate":1541968417296,"updatedBy":"dsa","type":"airwaybill","value":"5468468464568466","logisticTrackingID":"5468468464568466","senderName":"dasdf","senderAddress":"Batuceper","receiverName":"ATIK","receiverAddress":"JL. SRIKATON BARAT\n","manifestList":[{"logisticProviderCode":"asd","blibliAirwayBillNumber":"5468468464568466","status":"DEPARTED FROM TRANSIT [GATEWAY JAKARTA]","timestamp":1541976677000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"RECEIVED AT ORIGIN GATEWAY [GATEWAY JAKARTA]","timestamp":1541976343000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"PROCESSED AT SORTING CENTER [JAKARTA]","timestamp":1541968348000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"RECEIVED AT SORTING CENTER [JAKARTA]","timestamp":1541960930000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]},{"logisticProviderCode":"asdf","blibliAirwayBillNumber":"5468468464568466","status":"SHIPMENT RECEIVED BY asdf COUNTER OFFICER AT [JAKARTA]","timestamp":1541926728000,"additionalInfo":[{"label":"Third Party Tracking ID","value":null,"type":"STRING","description":"Third Party Tracking ID"}]}]}],"pageMetaData":null}
到目前为止,我只能这样做,但这个命令并没有 grep 所有的 json 字符串:
grep -o -P '(?<=var awbManifests = ).*(?=pageMetaData)' test.html
我如何解决它 ?
解决方案
请不要使用正则表达式来解析 html。如果你问我,你最好使用可以解析 HTML/XML和JSON 的工具,比如xidel。
解析相关的<td>
元素节点:
xidel -s input.htm -e '//td[@value="1431"]/following-sibling::td'
var awbManifests = {"requestId":"16d1-4451-9b12-f61a87e9cd11",[...],"pageMetaData":null};
隔离 JSON:
xidel -s input.htm -e '
//td[@value="1431"]/substring-before(substring-after(following-sibling::td,"awbManifests = "),";")
'
#or
xidel -s input.htm -e '
//td[@value="1431"]/extract(following-sibling::td,"awbManifests = (.+);",1)
'
{"requestId":"16d1-4451-9b12-f61a87e9cd11",[...],"pageMetaData":null}
解析 JSON:
xidel -s input.htm -e '
json(
//td[@value="1431"]/substring-before(substring-after(following-sibling::td,"awbManifests = "),";")
)
'
{
"requestId": "16d1-4451-9b12-f61a87e9cd11",
"errorMessage": null,
"errorCode": null,
"success": true,
"content": [...],
"pageMetaData": null
}
推荐阅读
- can-bus - candump 实用程序不打印非 FD 帧上数据长度的前导零?
- php - 当用户请求缺少控制器时,不会应用路由规则中的 CakePHP 2.x 语言前缀
- python - 如何使用 python 修复免费音乐存档 API 中的“403 客户端错误:url 禁止”错误
- c++ - 如何为比对象所有者寿命短的类组织对象所有权?
- c++ - 从构造函数中删除这个指针
- bash - 更改从 request_uri 获得的文件名
- c++ - 如何使用 C++ 17 检查运算符!= 是否存在模板参数?
- swift - 从 appDelegate 观察 .effectiveAppearance 变化
- java - Jersey 2.x post call MessageBodyWriter not found for media type=application/xml
- c# - 我是语义模型的初学者,使用 roslyn 从 TypeSyntax 获取具体类型的 TypeSymbol 的步骤是什么?