(TOP (S (NP (DT The)
            (NNP Fulton)
            (NNP County)
            (NNP Grand)
            (NNP Jury))
        (VP (VBD said)
            (NP (NNP Friday))
            (SBAR (-NONE- 0)
                  (S (NP (DT an)
                         (NN investigation)
                         (PP (IN of)
                             (NP (NP (NNP Atlanta))
                                 (POS 's)
                                 (JJ recent)
                                 (JJ primary)
                                 (NN election))))
                     (VP (VBD produced)
                         (NP (`` ``)
                             (DT no)
                             (NN evidence)
                             ('' '')
                             (SBAR (IN that)
                                   (S (NP (DT any)
                                          (NNS irregularities))
                                      (VP (VBD took)
                                          (NP (NN place)))))))))))
     (. .))


DT The NNP Fulton NNP County NNP Grand NNP Jury VBD said NNP Friday DT
an NN investigation ...

是否有任何算法来解析上述内容,或者我们需要使用正则表达式来执行此操作,我不想使用 NLTK 包来执行此操作。

Pyparsing 可以快速完成嵌套表达式解析。

import pyparsing as pp

LPAR, RPAR = map(pp.Suppress, "()")
expr = pp.Forward()
label = pp.Word(pp.alphas.upper()+'-') | "''" | "``" | "."
word = pp.Literal(".") | "''" | "``" | pp.Word(pp.printables, excludeChars="()")

expr <<= LPAR + label + (word | pp.OneOrMore(expr)) + RPAR

result = pp.OneOrMore(expr).parseString(sample)
print(' '.join(result))


TOP S NP DT The NNP Fulton NNP County NNP Grand NNP Jury VP VBD said NP NNP Friday SBAR -NONE- 0 S NP DT an NN investigation PP IN of NP NP NNP Atlanta POS 's JJ recent JJ primary NN election VP VBD produced NP `` `` DT no NN evidence '' '' SBAR IN that S NP DT any NNS irregularities VP VBD took NP NN place . .

通常,像这样的解析器将用于pp.Group(expr)保留嵌套元素的分组。但是在您的情况下,由于您最终还是想要一个平面列表,我们只是将其省略 - pyparsing 的默认行为是只返回匹配字符串的平面列表。
