首页 > 解决方案 > 获取 'package' 和 'endpackage' 可选字符串之外的结构名称列表

问题描述

我正在尝试获取外部packageendpackage可选字符串的结构名称。如果没有packageendpackage字符串,则脚本应返回所有结构名称。

这是我的脚本:

import re

a = """
package new;

typedef struct packed
{
    logic a;
    logic b;
} abc_y;

typedef struct packed
{
    logic a;
    logic b;
} abc_t;

endpackage

typedef struct packed
{
    logic a;
    logic b;
} abc_x;

"""

print(re.findall(r'(?!package)*.*?typedef\s+struct\s+packed\s*{.*?}\s*(\w+);.*?(?!endpackage)*', a, re.MULTILINE|re.DOTALL))

这是输出:

['abc_y', 'abc_t', 'abc_x']

预期输出:

['abc_x']

我在正则表达式中遗漏了一些东西,但不知道是什么。有人可以帮我解决这个问题吗?提前致谢。

标签: pythonpython-3.xregexpython-2.7re

解决方案


利用

\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);

请参阅正则表达式证明

解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  package                  'package'
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  endpackage               'endpackage'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  typedef                  'typedef'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  struct                   'struct'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  packed                   'packed'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  {                        '{'
--------------------------------------------------------------------------------
  [^{}]*                   any character except: '{', '}' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  }                        '}'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  ;                        ';'

蟒蛇代码

print(list(filter(None,re.findall(r'\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);', a, re.DOTALL))))

结果['abc_x']


推荐阅读