regex - 需要帮助从列表中提取数据模式,如下所示
问题描述
我有如下所示的数据示例,我想提取具有键/值模式的数据,例如:
'Signature Algorithm: ecdsa-with-SHA256', 'Not Before: Jul 23 00:00:00 2018 GMT', 'Not After : Jul 19 00:00:00 2033 GMT, 'Public Key Algorithm: id-ecPublicKeyid', Public-Key: (256 bit)'
这个例子:
['Certificate:', 'Data:', 'Version: 3 (0x2)', 'Serial Number:', 'b9:d9:f5:38:f8:42:6a:f9', 'Signature Algorithm: ecdsa-with-SHA256', 'Issuer: C=US, ST=NC, L=Raleigh, O=Eaton Corporation, OU=Electrical Division, CN=PowerXpert-02-00-34-56-63-01', 'Validity', 'Not Before: Jul 23 00:00:00 2018 GMT', 'Not After : Jul 19 00:00:00 2033 GMT', 'Subject: C=US, ST=NC, L=Raleigh, O=Eaton Corporation, OU=Electrical Division, CN=PowerXpert-02-00-34-56-63-01', 'Subject Public Key Info:', 'Public Key Algorithm: id-ecPublicKey', 'Public-Key: (256 bit)', 'pub:', '04:bf:72:4b:01:8b:5c:46:98:96:a6:d6:06:7b:d8:', '73:50:2f:47:85:60:f0:38:25:79:3d:96:be:20:3b:', '3f:39:c8:58:62:e9:d7:b6:f8:3a:6b:24:50:e1:5c:', '78:ce:5e:28:f2:60:3a:b6:cc:43:0e:0b:2b:d6:03:', '51:76:21:a4:78', 'ASN1 OID: prime256v1', 'NIST CURVE: P-256', 'X509v3 extensions:', 'X509v3 Subject Key Identifier:', '6A:E0:28:A7:17:2B:65:01:FD:31:48:5C:68:24:94:4B:42:49:76:58', 'X509v3 Basic Constraints: critical', 'CA:TRUE', 'X509v3 Key Usage: critical', 'Digital Signature, Certificate Sign, CRL Sign', 'X509v3 Subject Alternative Name:', 'DNS:PXG900-56-63-01', 'Signature Algorithm: ecdsa-with-SHA256', '30:45:02:20:43:33:58:ce:ef:f7:fd:a8:60:21:15:a3:2b:35:', '8c:1f:13:a0:1e:77:05:6f:1a:bb:a0:b6:fe:f3:ea:7b:6d:31:', '02:21:00:cf:db:9a:d1:6b:88:ae:fb:d5:5c:5a:db:0a:a0:eb:', 'a9:c9:4a:52:d0:57:18:9c:58:1b:67:42:47:c5:ec:bf:b0', '', '']
使用下面的正则表达式但没有得到想要的结果
regex = re.compile(r'''
[\S]+: # a key (any word followed by a colon)
(?:
\s # then a space in between
(?!\S+:)\S+ # then a value (any word not followed by a colon)
)+ # match multiple values if present
''', re.VERBOSE)
matches = regex.findall(str(lines))
print(matches)
解决方案
您可以使用以下正则表达式:
(?='[A-Za-z]+[\s-][A-Z][a-z]+\s?:?)'[^']+:[^']+'
(?=
积极展望。断言以下内容是:'[A-Za-z]+
, 撇号'
字符,后跟字母字符。[\s-]
一个空格或破折号-
。[A-Za-z]+
字母字符。\s?
可选的空格。:?\s
可选冒号:
后跟空格。
)
近前瞻。'[^']+
匹配撇号'
字符,任何不是撇号的'
字符。:
匹配一个冒号。[^']+'
匹配任何非撇号'
字符,后跟撇号'
。
你可以在这里测试正则表达式。
Python 片段:
import re
lines = ['Certificate:', 'Data:', 'Version: 3 (0x2)', 'Serial Number:', 'b9:d9:f5:38:f8:42:6a:f9', 'Signature Algorithm: ecdsa-with-SHA256', 'Issuer: C=US, ST=NC, L=Raleigh, O=Eaton Corporation, OU=Electrical Division, CN=PowerXpert-02-00-34-56-63-01', 'Validity', 'Not Before: Jul 23 00:00:00 2018 GMT', 'Not After : Jul 19 00:00:00 2033 GMT','Public Key Algorithm: id-ecPublicKey',
'Public-Key: (256 bit)']
matches = re.findall(r"(?='[A-Za-z]+[\s-][A-Z][a-z]+\s?:?)'[^']+:[^']+'",str(lines))
for match in matches:
print (match)
输出:
'Signature Algorithm: ecdsa-with-SHA256'
'Not Before: Jul 23 00:00:00 2018 GMT'
'Not After : Jul 19 00:00:00 2033 GMT'
'Public Key Algorithm: id-ecPublicKey'
'Public-Key: (256 bit)'
'Signature Algorithm: ecdsa-with-SHA256'
推荐阅读
- python - 使用 bs4 抓取后无法使用 sqlite3 插入外键
- java - 如何获取第一个谷歌搜索结果的 URL?
- git - 为什么在 Visual Studio Code 中执行 git pull 后有未提交的更改?
- wordpress - 如何安装多站点 WordPress,例如 www.mysite.com/de 而不是 www.de.mysite.com
- haskell - 类型参数作为数据/值构造函数
- haskell - 通过 `TypeError` 约束消除
- postgresql - 有人可以告诉我为什么会出现此错误,是因为间距(我知道引号很重要)吗?
- cassandra - 理解 Cassandra 中主键和分区之间的关系
- stripe-payments - Stripe 从会话中获取名称
- python - 根据日期时间列为一天中的小时和日期创建列