首页 > 解决方案 > 使用正则表达式捕捉莎士比亚角色的对话

问题描述

我正在尝试使用正则表达式来捕获莎士比亚对话,以练习使用正则表达式进行文本匹配。例如,我想捕捉CALIBAN在这个特定场景中调用的角色所说的所有文本:

  PROSPERO. Thou most lying slave,
    Whom stripes may move, not kindness! I have us'd thee,
    Filth as thou art, with human care, and lodg'd thee
    In mine own cell, till thou didst seek to violate
    The honour of my child.

  CALIBAN. O ho, O ho! Would't had been done.
    Thou didst prevent me. I had peopl'd else
    This isle with Calibans.

  PROSPERO. Thou most lying slave,
    Whom stripes may move, not kindness! I have us'd thee,
    Filth as thou art, with human care, and lodg'd thee
    In mine own cell, till thou didst seek to violate
    The honour of my child.

  CALIBAN. O ho, O ho! Would't had been done.
    Thou didst prevent me. I had peopl'd else
    This isle with Calibans.

我想捕捉

O ho, O ho! Would't had been done.
        Thou didst prevent me. I had peopl'd else
        This isle with Calibans.

我将如何使用正则表达式来实现这一点?我尝试了这个特殊的正则表达式:

(?<=\n  CALIBAN\. )[A-Za-z ',\.\n\!-]+(?=\n  PROSPERO\. |$)

注意:在实际文本中,总是有 2 个空格字符,然后是新字符的名称。每行的末尾都有一个回车符。我的正则表达式寻找CALIBAN.开始,然后匹配一些文本,并确保它必须以PROSPERO.. 但是,当我将其插入 regexp.com 时,我的整个文本都匹配了: 在此处输入图像描述

标签: regexregex-lookarounds

解决方案


您可以将此正则表达式与惰性量词一起使用:

(?<=\n  CALIBAN\. )[A-Za-z\s',.!-]+?(?=\n  PROSPERO\. |$)

更新的正则表达式演示

在 PHP 中使用:

$re = '/(?<=\n  CALIBAN\. )[A-Za-z\s\',.!-]+?(?=\n  PROSPERO\. |$)/';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the result
print_r($matches[0]);

推荐阅读