首页 > 解决方案 > Transposing Strings

问题描述

I am exploring mlb baseball statcast data, which is at the pitch level. I want to summarize the pitches at the plate appearance level. The relevant columns:

game_pk | pitcher_id | batter_id | inning | pitch_number | pitch_type | plt_apprnc_pk (unique)
----------------------------------------------------------------------------------------------
492321  |   542881   |   518466  |    2   |       1      |    FF      |   4923215428815184662
492321  |   543475   |   571448  |    1   |       2      |    SL      |   4923215434755714481
492321  |   543475   |   571448  |    1   |       1      |    FF      |   4923215434755714481
492321  |   543475   |   571448  |    1   |       3      |    FF      |   4923215434755714481

I am working on a query only showing one iteration of each plate appearance (plt_apprnc_pk), followed by pitch_number columns (1st pitch, 2nd pitch, 3rd pitch, etc...) showing each pitch type for the respective pitch number. In essence I want to transpose the pitch_type by pitch_number.

plt_apprnc_pk (unique)| first_pitch | second_pitch | third_pitch
----------------------------------------------------------------
  4923215434755714481 |     SL      |      FF      |     FF

I tried doing this with Case statements (ie CASE WHEN pitch_number = 1 THEN pitch_type END). I keep getting multiple rows for each plt_apprnc_pk.

plt_apprnc_pk (unique)| first_pitch | second_pitch | third_pitch
------------------------------------------------------------------
  4923215434755714481 |     SL      |      NULL    |    NULL
  4923215434755714481 |    NULL     |      FF      |    NULL
  4923215434755714481 |    NULL     |      NULL    |     FF

What is the best way to transpose these strings based on pitch_number while keeping only one iteration of the unique key??

标签: sqlsnowflake-cloud-data-platformtranspose

解决方案


I tried doing this with Case statements (ie CASE WHEN pitch_number = 1 THEN pitch_type END). I keep getting multiple rows for each plt_apprnc_pk.

The pattern here is called conditional aggregation:

SELECT plt_apprnc_pk
     ,MAX(CASE WHEN pitch_number = 1 THEN pitch_type END) AS first_pitch 
     ,MAX(CASE WHEN pitch_number = 2 THEN pitch_type END) AS second_pitch 
     ,MAX(CASE WHEN pitch_number = 3 THEN pitch_type END) AS thrid_pitch 
     -- ...
     --,MAX(CASE WHEN pitch_number = n THEN pitch_type END) AS nth_pitch 
FROM tab
GROUP BY plt_apprnc_pk;

Note: The maximum "pitch_number" has to be known in advance.


推荐阅读