首页 > 解决方案 > 重复捕获组

问题描述

我正在尝试编写一个捕获以下内容的正则表达式:

  1. 一行的问题(以“Q:”开头)
  2. 初始捕获之后的不确定数量的段落,在下一个“Q:”之前停止

到目前为止,这是我所得到的,但我要强调:

不工作:

到目前为止,我所得到的内容适用于前两个,但是当我添加新行时,它并没有捕获后续段落。

我错过了什么?

Q: What are the service limits associated with Amazon Athena?
Please click here to learn more about service limits.
 
Q: What is the underlying technology behind Amazon Athena?
Amazon Athena uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena can handle complex analysis, including large joins, window functions, and arrays. Because Amazon Athena uses Amazon S3 as the underlying data store, it is highly available and durable with data redundantly stored across multiple facilities and multiple devices in each facility. Learn more about Presto here.
 
Q: How does Amazon Athena store table definitions and schema?
Amazon Athena uses a managed Data Catalog to store information and schemas about the databases and tables that you create for your data stored in Amazon S3. In regions where AWS Glue is available, you can upgrade to using the AWS Glue Data Catalog with Amazon Athena. In regions where AWS Glue is not available, Athena uses an internal Catalog.
You can modify the catalog using DDL statements or via the AWS Management Console. Any schemas you define are automatically saved unless you explicitly delete them. Athena uses schema-on-read technology, which means that your table definitions applied to your data in S3 when queries are being executed. There’s no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored on Amazon S3.

标签: pythonregex

解决方案


您可以使用以下模式:

^(Q:.*?\n)(?!Q:)([\s\S]+?(?=^Q:|\Z))

演示

分解:

^(Q:.*?\n)     # Matches "Q:" at the beginning of the line, followed by
               # some optional text ending with a line-feed.
(?!Q:)         # Not immediately followed by another "Q:".
(              # Start of the second capturing group.
    [\s\S]+?   # Matches one or more characters (including line breaks) - non-greedy.
    (?=^Q:|\Z) # Stop matching if either followed by "Q:" or is at the end of the string.
)              # End of the second capturing group.

推荐阅读