首页 > 解决方案 > BigQuery comma delimited string evaluation

问题描述

I am using BigQuery and I'm trying to parse a comma delimited string to find specific numbers within it.

Example table as below

|--------|-----------------------------------------------------------|
| userID | sequence                                                  |
|--------|-----------------------------------------------------------|
| 123abc | 1,2,3,4,5,6,7,8                                           |                                          
|--------|-----------------------------------------------------------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11                                   |
|--------|-----------------------------------------------------------|
| 789def | 1,2,3,4                                                   |
|--------|-----------------------------------------------------------|

I need to create a CASE statement where each value of the string 'sequence' is evaluated as per the following logic and the result outputted to its own column.

SELECT userID
,sequence
,CASE WHEN sequence CONTAINS '1' THEN 1 ELSE 0 END AS action1 
,CASE WHEN sequence CONTAINS '2' THEN 1 ELSE 0 END AS action2 
,CASE WHEN sequence CONTAINS '3' THEN 1 ELSE 0 END as action3
....
,CASE WHEN sequence CONTAINS '9' AND '11' THEN 1 ELSE 0 END as action10

This would produce the following output.

|--------|-------------------------|-------|-------|-------|---------|
| userID | sequence                |action1|action2|action3|action10 |
|--------|-------------------------|-------|-------|-------|---------|
| 123abc | 1,2,3,4,5,6,7,8         |   1   |   1   |   1   |    0    |                                          
|--------|-------------------------|-------|-------|-------|---------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11 |   1   |   1   |   1   |    1    |
|--------|-------------------------|-------|-------|-------|---------|
| 789def | 1,2                     |   1   |   1   |   0   |    0    |
|--------|-------------------------|-------|-------|-------|---------|

Please not the last CASE WHEN statement is very important as I need to account for this very specific combination of string values as its own unique action.

I believe this would be achievable in SQL Server using something like:

CASE WHEN CHARINDEX('1', 'sequence')>0 THEN 1 ELSE 0 END as action1
,CASE WHEN CHARINDEX('2', 'sequence')>0 THEN 1 ELSE 0 END as action2
,CASE WHEN CHARINDEX('3', 'sequence')>0 THEN 1 ELSE 0 END as action3
...
,CASE WHEN CHARINDEX('9', 'sequence')>0 AND CHARINDEX('11', 'sequence')>0 THEN 1 ELSE 0 END as action10

However I cannot find an equivalent function in BigQuery that would achieve the same result and my attempts at REGEX have also fallen short.

I'd be very grateful for some guidance here. Thanks in advance.

标签: sqlregexstringgoogle-bigquery

解决方案


see direction below (for BigQuery Standard SQL)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
  SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
  SELECT '789def', '1,2' 
)
SELECT userID, 
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '1' ) > 0, 1, 0)  action1,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '2' ) > 0, 1, 0)  action2,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '3' ) > 0, 1, 0)  action3,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '4' ) > 0, 1, 0)  action4,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '5' ) > 0, 1, 0)  action5,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '6' ) > 0, 1, 0)  action6,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '7' ) > 0, 1, 0)  action7,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '8' ) > 0, 1, 0)  action8,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '9' ) > 0, 1, 0)  action9,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0)  action10
FROM `project.dataset.table`
-- ORDER BY userID  

this will give you something like below

Row userID  action1 action2 action3 action4 action5 action6 action7 action8 action9 action10     
1   123abc  1       1       1       1       1       1       1       1       0       0    
2   456bcd  1       1       1       1       1       1       1       1       1       1    
3   789def  1       1       0       0       0       0       0       0       0       0      

it is simplified - but gives you some guidance as you asked for :o)

See below idea for refactoring (which is endless process usually) so it is less verbose at least

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
  SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
  SELECT '789def', '1,2' 
)
SELECT userID, 
  IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
  IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
  IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
  IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
  IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
  IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
  IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
  IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
  IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0)  action10
FROM `project.dataset.table`

Update to address your comments about UNION ALL

Above is using dummy data from your question just so you can test, play with it - meantime the solution actually is

#standardSQL
SELECT userID, 
  IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
  IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
  IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
  IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
  IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
  IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
  IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
  IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
  IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
  IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0)  action10
FROM `project.dataset.table`

推荐阅读