首页 > 解决方案 > 如何调试传递给 grep 的(PCRE)正则表达式?

问题描述

我正在尝试调试传递给grep它的正则表达式,它似乎不适用于我的系统。

这是应该返回最新 terraform 版本的完整命令:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": "v\K.*?(?=")'

这似乎对其他人有效,但对我无效。在匹配额外空格之后添加*量词"tag_name":使其对我有用:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": *"v\K.*?(?=")'

这是wget没有管道的响应grep

{
  "url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583",
  "assets_url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583/assets",
  "upload_url": "https://uploads.github.com/repos/hashicorp/terraform/releases/20814583/assets{?name,label}",
  "html_url": "https://github.com/hashicorp/terraform/releases/tag/v0.12.12",
  "id": 20814583,
  "node_id": "MDc6UmVsZWFzZTIwODE0NTgz",
  "tag_name": "v0.12.12",
  "target_commitish": "master",
  "name": "",
  "draft": false,
  "author": {
    "login": "apparentlymart",
    "id": 20180,
    "node_id": "MDQ6VXNlcjIwMTgw",
    "avatar_url": "https://avatars1.githubusercontent.com/u/20180?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/apparentlymart",
    "html_url": "https://github.com/apparentlymart",
    "followers_url": "https://api.github.com/users/apparentlymart/followers",
    "following_url": "https://api.github.com/users/apparentlymart/following{/other_user}",
    "gists_url": "https://api.github.com/users/apparentlymart/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/apparentlymart/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/apparentlymart/subscriptions",
    "organizations_url": "https://api.github.com/users/apparentlymart/orgs",
    "repos_url": "https://api.github.com/users/apparentlymart/repos",
    "events_url": "https://api.github.com/users/apparentlymart/events{/privacy}",
    "received_events_url": "https://api.github.com/users/apparentlymart/received_events",
    "type": "User",
    "site_admin": false
  },
  "prerelease": false,
  "created_at": "2019-10-18T18:39:16Z",
  "published_at": "2019-10-18T18:45:33Z",
  "assets": [],
  "tarball_url": "https://api.github.com/repos/hashicorp/terraform/tarball/v0.12.12",
  "zipball_url": "https://api.github.com/repos/hashicorp/terraform/zipball/v0.12.12",
  "body": "BUG FIXES:\r\n\r\n* backend/remote: Don't do local validation of whether variables are set prior to submitting, because only the remote system knows the full set of configured stored variables and environment variables that might contribute. This avoids erroneous error messages about unset required variables for remote runs when those variables will be set by stored variables in the remote workspace. ([#23122](https://github.com/hashicorp/terraform/issues/23122))"
}

并且使用https://regex101.com我可以看到"tag_name": "v\K.*?(?=")并且"tag_name": *"v\K.*?(?=")都正确匹配版本号。

所以我的系统一定有问题,我很好奇为什么原来的系统对我不起作用,以及如何(如果可能)在这种情况下进行调试。

标签: regexbashshellgreppcre

解决方案


我已经能够将其缩小到以下范围。如果我在wget没有管道 grep 且没有格式化 json 响应的情况下执行命令:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest"

然后我得到一个没有任何空格的json(我将只发布一个响应的一部分):

"html_url":"https://github.com/hashicorp/terraform/releases/tag/v0.12.12","id":20814583,"node_id":"MDc6UmVsZWFzZTIwODE0NTgz","tag_name":"v0.12.12","target_commitish":"master","name":"","draft":false

所以很自然地原来的正则表达式会"tag_name": "v\K.*?(?=")失败,因为后面没有空格:

这显然与传递给 grep 或 grep 本身的正则表达式无关。我没有看到在此处深入研究响应本身的意义,因此可以认为原始问题已解决(尽管如果有人知道可能导致此问题的原因,请发表评论。)


推荐阅读