首页 > 解决方案 > 在 pyparsing 中处理 ZeroOrMore

问题描述

我正在尝试使用 pyparsing 解析 pactl 列表:到目前为止,所有解析都正常工作,但我无法让 ZeroOrMore 正常工作。

我可以找到foo:foo: bar尝试处理它,ZeroOrMore但它不起作用,我必须添加特殊情况"Argument:"才能找到没有价值的结果,但是有Argument: foo结果(有价值)所以它不起作用,我期待任何其他财产没有价值而存在。

有了这个定义和一个固定的 pactl 列表输出:

#!/usr/bin/env python

#
# parsing pactl list
#

from pyparsing import *
import os
from subprocess import check_output
import sys

data = '''
Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
'''

indentStack = [1]
stmt = Forward()

identifier = Word(alphanums+"-_.")

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1)))))
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)

stmt << ( section | prop | ("Argument:") | value | prop_val )

syntax = OneOrMore(stmt)

parseTree = syntax.parseString(data)
parseTree.pprint()

这得到:

$ ./pactl.py

Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
[[['Module'], ['6']],
 [['Argument:'],
  [[['Name'], ['module-alsa-card']]],
  [[['Usage counter'], ['0']]],
  ['Properties:',
   [[[['module.author'], ['"Lennart Poettering"']]],
    [[['module.description'], ['"ALSA Card"']]],
    [[['module.version'], ['"14.0-rebootstrapped"']]]]]]]

到目前为止一切顺利,但删除它的特殊情况Argument:会出错,因为 ZeroOrMore 的行为不如预期:

#!/usr/bin/env python

#
# parsing pactl list
#

from pyparsing import *
import os
from subprocess import check_output
import sys

data = '''
Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
'''

indentStack = [1]
stmt = Forward()

identifier = Word(alphanums+"-_.")

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)

stmt << ( section | prop | value | prop_val )


syntax = OneOrMore(stmt)

parseTree = syntax.parseString(data)
parseTree.pprint()

这导致:

$ ./pactl.py

Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 19(3,9)
Matched Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) -> [[['Argument'], ['Name']]]
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 1(2,1)
Exception raised:Expected ":", found '#'  (at char 8), (line:2, col:8)
Traceback (most recent call last):
  File "/home/alberto/projects/node/pacmd_list_json/./pactl.py", line 55, in <module>
    parseTree = syntax.parseString(partial)
  File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 1955, in parseString
    raise exc
  File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 6336, in checkUnindent
    raise ParseException(s, l, "not an unindent")
pyparsing.ParseException: Expected {{Group:({Group:(W:(ABCD...)) Suppress:("#") Group:(W:(0123...))}) indented block} | {"Properties:" indented block} | Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) | Group:({Group:(W:(ABCD...)) Suppress:("=") Group:(Combine:({{W:(ABCD...) | <SP><TAB>}}...))})}, found ':'  (at char 41), (line:4, col:13)

setDebug value语法看ZeroOrMore是从下一行获取标记[[['Argument'], ['Name']]]

我尝试LineEnd()了其他技巧,但没有奏效。

关于如何处理ZeroOrMore停止LineEnd()或没有特殊情况的任何想法?

注意:可以使用以下方法检索实际输出:

env = os.environ.copy()
env['LANG'] = 'C'
data = check_output(
    ['pactl', 'list'], universal_newlines=True, env=env)

标签: pythonparsingpyparsing

解决方案


indentedBlock不是最容易使用的 pyparsing 元素。但是,您正在做的一些事情妨碍了您。

为了调试这个,我分解了一些更复杂的表达式,使用 setName() 给它们命名,然后添加 .setDebug()。像这样:

identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()

这将告诉 pyparsing 在此表达式即将匹配时输出一条消息,如果匹配成功,或者如果不匹配,则引发异常。

Match identifier at loc 1(2,1)
Matched identifier -> ['Module']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 23(3,13)
Exception raised:Expected identifier, found ':'  (at char 23), (line:3, col:13)

看起来这些表达式通过处理应该是缩进空间的空白来搞乱 indentedBlock 匹配:

Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))

Word 中的"字符和空格让我相信您正在尝试匹配引用的字符串。我将这个表达式替换为:

Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString))

您还需要注意不要阅读超过行尾的内容,否则您还会弄乱 indentedBlock 缩进跟踪。我在顶部为换行添加了这个表达式:

NL = LineEnd()

然后将其用作and的stopOn参数:OneOrMoreZeroOrMore

prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=")  + Group(prop_val_value)).setName("prop_val")#.setDebug()

这是我最终得到的解析器:

indentStack = [1]
stmt = Forward()
NL = LineEnd()

identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums))).setName("sect_def")#.setDebug()
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

#~ value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
value_label = originalTextFor(OneOrMore(identifier)).setName("value_label")#.setDebug()
value = Group(value_label
              + Suppress(":")
              + Optional(~NL + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_.') | quotedString(), stopOn=NL))))).setName("value")#.setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
#~ prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=") + Group(prop_val_value)).setName("prop_val")#.setDebug()
prop = (prop_name + prop_section).setName("prop")#.setDebug()

stmt << ( section | prop | value | prop_val )

这给出了这个:

[[['Module'], ['6']],
 [[['Argument']],
  [['Name', ['module-alsa-card']]],
  [['Usage counter', ['0']]],
  ['Properties:',
   [[['module.author', ['"Lennart Poettering"']]],
    [['module.description', ['"ALSA Card"']]],
    [['module.version', ['"14.0-rebootstrapped"']]]]]]]

推荐阅读