首页 > 解决方案 > Java REGEX,从字符串中删除两种不同类型的注释

问题描述

我的文本有两种类型的评论。由 分隔的%以及以 开头/*和结尾的*/。例如:

输入1:Sarah was going out. % Remember she usually doesn't go out % It was very cold.

DESIRED_OUTPUT1:Sarah was going out. It was very cold.

输入 2:Sarah was going out. /* Remember she usually doesn't go out */ It was very cold.

DESIRED_OUTPUT2:Sarah was going out. It was very cold.

输入 3:Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.

DESIRED_OUTPUT3:Charles knocked on the door and a woman opened it. She looked at him. - Yes?, she said.

输入4:Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure to 100% */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.

DESIRED_OUTPUT4:Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead?

基本上,我希望在遇到开始注释标记时,所有内容都被删除,直到其各自的结束注释标记(即使这意味着删除其他类型的注释标记)。

如果使用%或打开注释/*但从未关闭,则假定注释将持续到文本结尾。但是,如果它只是这种类型的结束标记*/(因为开启者在另一个评论中,因此被删除),它应该留在文本中。

标签: javaregexreplace

解决方案


您可以使用

.replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","")

查看正则表达式演示

细节

  • %[^%]*%?-%...%像带有可选尾随分隔符的注释:
    • %- 一个%字符
    • [^%]*- 0 个或更多字符%
    • %?- 一个可选%字符
  • |- 或者
  • /\*[^*]*(?:\*(?!/)[^*]*)*(?:\*/)?-/*...*/像带有可选尾随分隔符的注释:
    • /\*-/*字符串
    • [^*]*- 0 个或更多字符*
    • (?:\*(?!/)[^*]*)*- 0 次或多次出现
      • \*(?!/)- a*不跟随/
      • [^*]*- 0 个或更多字符*
    • (?:\*/)?- 一个可选的*/子字符串。

推荐阅读