sql - BigQuery comma delimited string evaluation
问题描述
I am using BigQuery and I'm trying to parse a comma delimited string to find specific numbers within it.
Example table as below
|--------|-----------------------------------------------------------|
| userID | sequence |
|--------|-----------------------------------------------------------|
| 123abc | 1,2,3,4,5,6,7,8 |
|--------|-----------------------------------------------------------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11 |
|--------|-----------------------------------------------------------|
| 789def | 1,2,3,4 |
|--------|-----------------------------------------------------------|
I need to create a CASE statement where each value of the string 'sequence' is evaluated as per the following logic and the result outputted to its own column.
SELECT userID
,sequence
,CASE WHEN sequence CONTAINS '1' THEN 1 ELSE 0 END AS action1
,CASE WHEN sequence CONTAINS '2' THEN 1 ELSE 0 END AS action2
,CASE WHEN sequence CONTAINS '3' THEN 1 ELSE 0 END as action3
....
,CASE WHEN sequence CONTAINS '9' AND '11' THEN 1 ELSE 0 END as action10
This would produce the following output.
|--------|-------------------------|-------|-------|-------|---------|
| userID | sequence |action1|action2|action3|action10 |
|--------|-------------------------|-------|-------|-------|---------|
| 123abc | 1,2,3,4,5,6,7,8 | 1 | 1 | 1 | 0 |
|--------|-------------------------|-------|-------|-------|---------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11 | 1 | 1 | 1 | 1 |
|--------|-------------------------|-------|-------|-------|---------|
| 789def | 1,2 | 1 | 1 | 0 | 0 |
|--------|-------------------------|-------|-------|-------|---------|
Please not the last CASE WHEN statement is very important as I need to account for this very specific combination of string values as its own unique action.
I believe this would be achievable in SQL Server using something like:
CASE WHEN CHARINDEX('1', 'sequence')>0 THEN 1 ELSE 0 END as action1
,CASE WHEN CHARINDEX('2', 'sequence')>0 THEN 1 ELSE 0 END as action2
,CASE WHEN CHARINDEX('3', 'sequence')>0 THEN 1 ELSE 0 END as action3
...
,CASE WHEN CHARINDEX('9', 'sequence')>0 AND CHARINDEX('11', 'sequence')>0 THEN 1 ELSE 0 END as action10
However I cannot find an equivalent function in BigQuery that would achieve the same result and my attempts at REGEX have also fallen short.
I'd be very grateful for some guidance here. Thanks in advance.
解决方案
see direction below (for BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.table` AS (
SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
SELECT '789def', '1,2'
)
SELECT userID,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '1' ) > 0, 1, 0) action1,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '2' ) > 0, 1, 0) action2,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '3' ) > 0, 1, 0) action3,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '4' ) > 0, 1, 0) action4,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '5' ) > 0, 1, 0) action5,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '6' ) > 0, 1, 0) action6,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '7' ) > 0, 1, 0) action7,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '8' ) > 0, 1, 0) action8,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '9' ) > 0, 1, 0) action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`
-- ORDER BY userID
this will give you something like below
Row userID action1 action2 action3 action4 action5 action6 action7 action8 action9 action10
1 123abc 1 1 1 1 1 1 1 1 0 0
2 456bcd 1 1 1 1 1 1 1 1 1 1
3 789def 1 1 0 0 0 0 0 0 0 0
it is simplified - but gives you some guidance
as you asked for :o)
See below idea for refactoring (which is endless process usually) so it is less verbose at least
#standardSQL
WITH `project.dataset.table` AS (
SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
SELECT '789def', '1,2'
)
SELECT userID,
IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`
Update to address your comments about UNION ALL
Above is using dummy data from your question just so you can test, play with it - meantime the solution actually is
#standardSQL
SELECT userID,
IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`
推荐阅读
- apache - Htaccess 重定向 301 不起作用,一切似乎都很好
- power-automate - Microsoft Flow:“发送电子邮件”给多个用户的电子邮件 ID
- python - Python3:尝试在两个字符之间插入文本 - inputFile 与 file.write() 冲突
- java - Javafx:使用 FlowPane 的响应式设计
- javascript - 无效的网络服务调用
- javascript - 如何获取 unicode 中的 HTML 文本?
- spring-boot - Spring Boot 2.1 复制 reactiveMongoTemplate bean
- perl - 使用 =~ 运算符进行字符串比较
- c# - 使用 linq2db 联合不同类型
- javascript - 如果@ char 在使用javascript的字符串中的大括号内,如何不显示下拉菜单或做出反应?