powershell - 如何在 Powershell 的分隔记录中删除 JSON 文本(包括 CRLF)
问题描述
有一个奇怪的问题,我需要在波浪号分隔的文件中删除 JSON 文本(由于 JSON 每一行末尾的 CRLF,JSON 会中断导入)。示例行:
Test Plan Work~Response Status: BadRequest Bad Request,Response Content: {
"trace": "0HM5285F2",
"errors": [
{
"code": "server_error",
"message": "Couldn't access service ",
"moreInfoUrl": null,
"target": {
"type": null,
"name": null
}
}
]
},Request: https://www.test.com Headers: Accept: application/json
SubscriberId:
~87c5de00-5906-4d2d-b65f-4asdfsdfsdfa29~3/17/2020 1:54:08 PM
或者像这样没有 JSON 但仍然具有我需要的相同模式的那些:
Test Plan Pay Work~Response Status: InternalServerError Internal Server Error,Response Content: Error,Request: https://api.test.com Headers: Accept: application/json
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5c
SubscriberId: eb7aee
~9d05b16e-e57b-44be-b028-b6ddsdfsdf62a5~1/20/2021 7:07:53 PM
需要这两种类型的 CSV 文本格式为:
Test Plan Work~Response Status: BadRequest Bad Request~87c5de00-5906-4d2d-b65f-4asdfsdfsdfa29~3/17/2020 1:54:08 PM
JSON(包括 JSON 每一行末尾的 CRLF)正在中断将数据导入 Powershell。任何帮助或见解将不胜感激!
解决方案
PowerShell(或者更确切地说,.NET)在其正则表达式引擎中有两个特殊的功能,可能非常适合这个用例 -平衡组和条件!
平衡组是一个需要完全解释的复杂功能,但它本质上允许我们“记录”正则表达式模式中特定命名子表达式的出现次数,应用时看起来像这样:
PS ~> $string = 'Here is text { but wait { it has } nested { blocks }} here is more text'
PS ~> $string -replace '\{(?>\{(?<depth>)|[^{}]+|\}(?<-depth>))*(?(depth)(?!))\}'
Here is text here is more text
让我们分解正则表达式模式:
\{ # match literal '{'
(?> # begin atomic group*
\{(?<depth>) # match literal '{' and increment counter
| [^{}]+ # OR match any sequence of characters that are NOT '{' or '}'
| \}(?<-depth>) # OR match literal '}' and decrement counter
)* # end atomic group, whole group should match 0 or more times
(? # begin conditional group*
(depth)(?!) # if the 'depth' counter > 0, then FAIL!
) # end conditional group
\} # match literal '}' (corresponding to the initial '{')
*)(?>...)
原子分组可防止回溯 - 防止意外计数多次。
对于其余字段中的 CRLF 字符,我们可以在模式前加上(?s)
- 这使得正则表达式引擎在匹配.
“任何”元字符时包含新行,直到我们到达之前的位置~87c5...
:
(?s),Response Content:\s*\{(?>\{(?<depth>)|[^{}]+|\}(?<-depth>))*(?(depth)(?!))\}.*?(?=~)
或者我们可以,也许更准确地说,将 JSON 之后的字段描述为重复的,
“和”对,
:
,Response Content:\s*(?:\{(?>\{(?<depth>)|[^{}]+|\}(?<-depth>))*(?(depth)(?!))\})?\s*(?:,[^,]+?)*(?=~)
让我们尝试一下您的多行输入字符串:
$string = @'
Test Plan Work~Response Status: BadRequest Bad Request,Response Content: {
"trace": "0HM5285F2",
"errors": [
{
"code": "server_error",
"message": "Couldn't access service ",
"moreInfoUrl": null,
"target": {
"type": null,
"name": null
}
}
]
},Request: https://www.test.com Headers: Accept: application/json
SubscriberId:
~87c5de00-5906-4d2d-b65f-4asdfsdfsdfa29~3/17/2020 1:54:08 PM
'@
$string -replace ',Response Content:\s*(?:\{(?>\{(?<depth>)|[^{}]+|\}(?<-depth>))*(?(depth)(?!))\})?\s*(?:,[^,]+?)*(?=~)'
输出:
Test Plan Work~Response Status: BadRequest Bad Request~87c5de00-5906-4d2d-b65f-4asdfsdfsdfa29~3/17/2020 1:54:08 PM
推荐阅读
- mysql - 高级查询的摘要行
- swift - Swift 持久化 cookie 存储
- elasticsearch - 如何根据不同的对象编写 ElasticSearch Query?
- laravel - laravel 中的 Illuminate \ Http \ Exceptions \ PostTooLargeException
- ssl - 如何在 macOS catalina 10.15.7 中安装 wget?
- google-sheets - 搜索范围内的值
- intellij-idea - Intellij 2021.1 - Sencha ExtJS 插件不工作
- ios - Metal 设备指针是否在分派中持续存在?
- html - CSS 我可以过渡到 flex-end 吗?
- authentication - 某些文件需要登录