首页 > 解决方案 > BigQuery: check value for one of repeated ID and store in new column for all repeated IDs

问题描述

I have data that has an ID and then a varying number of states that are linked through rows with the same ID. I want to create a new column that shows a TRUE/FALSE if a given ID has ever been a state and for this information to be attached to all rows with the associated ID.

Example initial data:

'''''''''''''''''
| ID | status   |
'''''''''''''''''
| 1  | state1   |
| 1  | state2   |
| 1  | state2   |
| 1  | state3   |
| 2  | state2   |
| 2  | state3   |
| 3  | state2   |
| 3  | state3   |
| 3  | state1   |
| ...| ...      |

and I want to get it to look like this:

''''''''''''''''''''''''''''''''''
| ID | status   | is_ever_state1 |
''''''''''''''''''''''''''''''''''
| 1  | state1   | TRUE           |
| 1  | state2   | TRUE           |
| 1  | state2   | TRUE           |
| 1  | state3   | TRUE           |
| 2  | state2   | FALSE          |
| 2  | state3   | FALSE          |
| 3  | state2   | TRUE           |
| 3  | state3   | TRUE           |
| 3  | state1   | TRUE           |
| ...| ...      | ...            |

<!-- begin snippet: js hide: false console: true babel: false -->

Here is the code to create the initial table:

WITH table AS
 (SELECT 1 as ID, 'state1' as status UNION ALL
  SELECT 1, 'state2' UNION ALL
  SELECT 1, 'state2' UNION ALL
  SELECT 1, 'state3' UNION ALL
  SELECT 2, 'state2' UNION ALL
  SELECT 2, 'state3' UNION ALL
  SELECT 3, 'state2' UNION ALL
  SELECT 3, 'state3' UNION ALL
  SELECT 3, 'state1')
SELECT * FROM table

I'm a bit stuck as to how to do this. One idea I had was to put it into a structure then use the case function to partition it on a per parent ID basis (and then UNNEST afterwards) however it appears I cannot do this as it won't let me access the children within the structure format and therefore I do not think this is the way to do it.

SELECT
ID,
status,
case
  when status_struct.status = 'state1' then TRUE
  when status_struct.status != 'state1' then FALSE
end as is_ever_state1
FROM(
  SELECT 
  ID,
  status,
  ARRAY_AGG(STRUCT( status))AS status_struct,
  FROM table
  GROUP BY ID,status)

标签: sqlgoogle-bigquery

解决方案


Below is for BigQuery Standard SQL

#standardSQL
SELECT *, 
  MAX(status = 'state1') OVER(PARTITION BY ID) AS is_ever_state1
FROM table    

if to apply to sample data from your question - output is

Row ID  status  is_ever_state1   
1   1   state1  true     
2   1   state2  true     
3   1   state2  true     
4   1   state3  true     
5   2   state2  false    
6   2   state3  false    
7   3   state2  true     
8   3   state3  true     
9   3   state1  true     

推荐阅读