sql - Oracle SQL:在两个字符之间提取多个文本
问题描述
我有如下表:
|-------------|---------------------------------------------------|
|ID. | CONTENT |
|-------------|---------------------------------------------------|
|1 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content.
|2 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content.
|3 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content. <SUB-TITLE-3> Content
|-------------|---------------------------------------------------|
我想提取 <> 之间的所有文本,所以它将如下所示:
|-------------|-------------------------------------------------|
|ID. | CONTENT |
|-------------|-------------------------------------------------|
|1 |TITLE |
|1 |SUB-TITLE-1 |
|1 |SUB-TITLE-2 |
|2 |TITLE |
|2 |SUB-TITLE-1 |
|2 |SUB-TITLE-2 |
|3 |TITLE |
|3 |SUB-TITLE-1 |
|3 |SUB-TITLE-2 |
|3 |SUB-TITLE-3 |
|-------------|-------------------------------------------------|
如何做到这一点?我正在尝试通过正则表达式来做,但我想我迷路了。我的 Oracle 版本是 18c,如果这有帮助的话......
解决方案
您可以使用的第四个参数REGEXP_SUBSTR
来指定匹配的出现。
要获取第 1 次、第 2 次和第 3 次出现的行,您可以使用 dual 的子查询进行交叉连接。
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, s.match_occurrence, 'i', 1) AS content_match
FROM test_data t
CROSS JOIN (
SELECT 1 AS match_occurrence FROM dual UNION
SELECT 2 AS match_occurrence FROM dual UNION
SELECT 3 AS match_occurrence FROM dual UNION
SELECT 4 AS match_occurrence FROM dual
/* ... etc, with the number of rows equal to the maximum number of matches that can appear */
) s
WHERE REGEXP_SUBSTR(t.content_data, '<.*?>', 1, s.match_occurrence) IS NOT NULL /* Only return records that have a match for the given occurrence */
ORDER BY t.content_id, s.match_occurrence
借用CONNECT_BY_LEVEL
巴巴罗斯的出色答案,您可以更简洁地做到这一点:
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, LEVEL, 'i', 1) AS content_match
FROM test_data t
CONNECT BY
LEVEL <= REGEXP_COUNT(t.content_data, '<.*?>')
AND PRIOR sys_guid() IS NOT NULL
AND PRIOR content_id = content_id
ORDER BY t.content_id, LEVEL
请注意,该CONNECT_BY_LEVEL
方法在大型数据集上可能会较慢,因此如果性能是一个问题,我会避免这种情况。
推荐阅读
- ios - 用于在字符串中搜索格式说明符的 Swift 正则表达式
- php - 如何在回声中嵌入 if 并返回
- c - for Loop Not Running - Temperature Problem
- emacs - 通过 prolog.el 在 Emacs 中选择这样或这样的 Prolog 系统
- swiftui - 从 UIKit 调整 SwiftUI 组件
- sql - 更新时将查询的 SQL 数据保持在一种状态
- sql-server - 尽管能够通过其他方式连接,但无法使用 JDBC 连接到数据库(通信链接故障)
- linux - 尽管设置了“SSH keepalive”,但 MobaXterm ssh 不断断开连接
- python - 在python3.9中提取zipfile时出现错误'NotImplementedError(“不支持该压缩方法”)'
- ios - 可以使用 Amplify 从 Swift-App 调用 AWS-Lambda 函数