首页 > 解决方案 > 需要 RE 才能在行尾前仅提取大写单词集

问题描述

我想创建一个正则表达式,它将在一行上提取一组大写单词(由空格分隔)。

例如,在本文中

    TOPIC ONE
    Description of this topic, one CAPITAL word
    TOPIC NUMBER TWO
    Description of this topic two CAPITAL word


我只需要选择主题一和主题二,而不是大写这个词。

我尝试了以下 RE

    \b[A-Z]+\b

能够单独提取大写单词

我也试过

    \b[A-Z]+\ \b

但它会选择除最后一个大写单词之外的所有内容。

我想确保 RE 始终只选择多个单词。

这是要测试的示例文本:


    CHIEF COMPLAINT  Weakness inability to talk

    HISTORY OF THE PRESENT ILLNESS  This is a yearold
    AfricanAmerican male with a history of hypertension who was
    in his usual state of health

    FAMILY HISTORY  Unknown

    SOCIAL HISTORY  The patient lives 

    PHYSICAL EXAMINATION ON ADMISSION  During the five minute
    examination the patient became progressively less responsive
    and then vomited requiring intubation and paralytics during
    the examination 

标签: regex

解决方案


您可以使用

\b[A-Z]+(?:\s+[A-Z]+)+\b
\b[A-Z]+(?:[^\S\r\n]+[A-Z]+)+\b
\b\p{Lu}+(?:\h+\p{Lu}+)+\b

请参阅正则表达式演示正则表达式图

在此处输入图像描述

细节

  • \b- 单词边界
  • [A-Z]+- 1+ 大写 ASCII 字母(\p{Lu}匹配任何 Unicode 大写字母)
  • (?:\s+[A-Z]+)+- 1 次或多次连续出现
    • \s+- 1+ 个空格([^\S\r\n]+, \h+,[\p{Zs}\t]+将匹配 1 个或多个水平空格)
    • [A-Z]+- 1+ 大写 ASCII 字母
  • \b- 单词边界

推荐阅读