sql - 如何使用 re2 正则表达式提取单个句点之间的所有文本（即忽略`...`）？

问题描述

如何...使用 re2 正则表达式提取单个句点（即忽略）之间的所有文本？

我正在使用REGEXP_EXTRACT_ALLBigQuery 中的函数，该函数使用https://github.com/google/re2/wiki/Syntax。

从以下示例：

This is... a.. sentence. It is just an example.

我想提取查询

This is... a.. sentence.和 It is just an example.

我对这是否可以在 BigQuery 中使用 SQL 函数而不是引入其他工具特别感兴趣

标签： sqlregexgoogle-bigqueryre2

考虑以下解决方法

select text, regexp_replace(sentence, r'(#)(\.+)(#)', r'\2') sentence
from `project.dataset.table`, 
unnest(split(trim(regexp_replace(regexp_replace(text, r'(\.+)', r'#\1#'), r'(\#\.\#)', r'####'), '####'), '####')) sentence

如果应用于您问题中的样本数据 - 输出是

sql - 如何使用 re2 正则表达式提取单个句点之间的所有文本（即忽略`...`）？

问题描述

解决方案

推荐阅读