首页 > 解决方案 > 从列表中获取子字符串

问题描述

我正在尝试在电子邮件正文中搜索特定行。我已经能够提取整个电子邮件正文。现在我想从中提取特定的行。到目前为止我的代码:

resp, items = conn.uid("search",None, 'All')
items = items[0].split()
for emailid in items:
    resp, data = conn.uid("fetch",emailid, "(RFC822)")
    if resp == 'OK':
        email_body = data[0][1].decode('utf-8')
        mail = email.message_from_string(email_body)
        if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:

           regex = r"(\bEvent demon log entry:)(?:\r?\n|\r)+(\[[^]]+\].*)"
           a=re.findall(regex, email_body , re.IGNORECASE)

我现在得到这些行:

[(u'Event demon log entry:', u'[27/12/2018 05:29:30]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 04:58:05] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=\r')]
[(u'Event demon log entry:', u'[27/12/2018 06:00:03]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 07:00:05]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]

但想要得到和之间的[(u'Event demon log entry:', u'[27/12/2018 05:29:30]一切EVENT: ALARM ALARM: JO=\r')]

期望的输出:

CAUAJM_I_40245 EVENT

来自电子邮件正文的原始代码:

Event demon log entry:

[27/12/2018 04:48:17]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: bx_p2_reporting EXITCODE:  1

更新:

原来我需要得到以下信息:

JOB: bx_p2_reporting EXITCODE:  1

Event demon log entry:

[26/12/2018 20:17:14] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=
_batch_excel_RevalFutBasisSpdCalc_NY3pm MACHINE: ldnmdsbatchxl01 EXITCODE: =
268438455

标签: pythonregexlist

解决方案


您可以使用

r'Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM'

查看正则表达式演示

如果你使用它re.findall,你应该只得到CAUAJM_I_40245.

细节

  • Event demon log entry:- 文字子串
  • [\r\n]*- 0+ CR 或 LF 符号
  • \[- 一个[字符
  • [^]]+- 1 个或多个字符以外的字符]
  • ]- 一个]字符
  • \s*- 0+ 空白字符
  • (.*?)- Group 1:任何零个或多个字符,除了换行符,尽可能少
  • \s*- 0+ 空白字符
  • EVENT: ALARM- 文字子串。

Python演示

import re
rx = r"Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM"
s = "Event demon log entry:\n\n[27/12/2018 04:48:17]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: bx_p2_reporting EXITCODE:  1"
print(re.findall(rx, s, re.IGNORECASE))
# => ['CAUAJM_I_40245']

推荐阅读