首页 > 解决方案 > regex works fine on the limited input but script hangs when whole input file is given

问题描述

I am writing a script to parse the desired text with regex from different blocks of input, but somehow the regex doesn't handle the whole input file correctly and script hangs. Can someone please help me fix the issue in the regex?

This is my script:

import re
a = """
  abc # (.C          (1),
         .H          (1)
           )   
        xyz [M-1:0]
      (.a        (a),
       .y        (y),
        .e       (e)
       );   

  por
    chk [N-1:0] (/*AUTOINST*/
                      // Outputs

  elem  line[E-1:0] (.en (en    [E-1:0]),

generate
for(i = 0) begin: check
    check
          #(
            .F                 (1::t),
            .C                 (1),
            .H                 (1)
        )
        data_check
           (// Outputs
  
except_check
          #(
            .a        (m),
            .b        (w),
            .e        (1)
        )
        data_check
           (// Outputs

    block1
          #(/*AUTOINSTPARAM*/
        // Parameters
        .THREE          (3),     // comment
        .TWO            (2), // comment
        .ONE    (1))             // comment
        inst1
           (/*AUTOINST*/
        // extra
        // output

    block2
          #(/*AUTOINSTPARAM*/
        // Parameters
        .THREE          (3),     // comment
        .TWO            (2), // comment
        .ONE    (1))             // comment
        inst2
           (/*AUTOINST*/
        // extra
        // output

"""

op = re.findall(r'^\s*(\w+)\s*\n*(?:\s*[^\w\s].*\n*)*\s*(\w+)\s*(?:\[.*\])*\s*\(', a, re.MULTILINE)

for i in op:
    print(i)

This is the output:

('abc', 'xyz')
('por', 'chk')
('elem', 'line')
('generate', 'for')
('check', 'data_check')
('except_check', 'data_check')
('block1', 'inst1')
('block2', 'inst2')

Now if I add following lines at the end of the input a in script, then the script just hangs and I need to kill it with control+c.

a = """
  abc # (.C          (1),
         .H          (1)
           )   

< copy same as above and add following at the end >

output [`X-1:0]                    o,            // o

////////////////////////////////////

"""

After I kill, I see this log:

^CTraceback (most recent call last):
  File "1.py", line 66, in <module>
    op = re.findall(r'^\s*(\w+)\s*\n*(?:\s*[^\w\s].*\n*)*\s*(\w+)\s*(?:\[.*\])*\s*\(', a, re.MULTILINE)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 181, in findall
    return _compile(pattern, flags).findall(string)
    KeyboardInterrupt

I am not sure what is so special in those 2 lines. Cannot figure out how to handle this scenario in regex as the whole input file can have that type of lines which are not being taken care of. There are .* in the regex, that might be creating the problem, but not sure. It will be great if someone can help me fixing it.

标签: pythonregex

解决方案


推荐阅读