首页 > 解决方案 > 尝试使用 Snowflake sql 将一列拆分为多列

问题描述

在此处输入图像描述

数据需要从 SOID 列拆分为 Circ、Language、Words,如上图所示。尝试使用以下逻辑时:-

SELECT SOID,
 regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 ]*)(,\\s|$)', 1, 1, 'e') AS "Circuit",
 regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language\\(s\\):\\s([a-zA-Z0-9, ]+)(,\\s|$)', 1, 1, 'e') AS "Language",
 regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Words:\\s([a-zA-Z0-9 ]*)(,\\s|$)', 1, 1, 'e') AS "Words"
FROM XYZ;

数据得到了正确处理,但是根据以黄色突出显示的图片,某些数据没有被上述逻辑捕获。它没有将语言显示为英语,而是将其显示为 null,并将电路显示为 Biotechnology.... 如图所示,它显示为 null。请需要您的意见。

标签: sqlsnowflake-cloud-data-platform

解决方案


似乎问题与处理“(s)”部分有关:

with XYZ as (
  select 'Attachments: 1, Circuit: North America, Language: English, Words: 400' as SALES_ORDER_ITEM_DESCRIPTION
  union all
  select 'Attachments: 1, Circuit: North America, Language(s): English,Spanish, Words: 500' as SALES_ORDER_ITEM_DESCRIPTION
  union all
  select 'Attachments: 1, Circuit: Biotechnology Newsline [National], Language(s): English, Words: 600' as SALES_ORDER_ITEM_DESCRIPTION  
)                                                  
SELECT 
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Circuit:\\s([a-zA-Z0-9 \\[\\]]+)(,\\s|$)', 1, 1, 'e') AS "Circuit", 
regexp_substr(SALES_ORDER_ITEM_DESCRIPTION, 'Language[()s]*:\\s([a-zA-Z0-9\\, ]+)(,\\s|$)', 1, 1, 'e') AS "Language"
FROM XYZ;


+-----------------------------------+-----------------+
|              Circuit              |    Language     |
+-----------------------------------+-----------------+
| North America                     | English         |
| North America                     | English,Spanish |
| Biotechnology Newsline [National] | English         |
+-----------------------------------+-----------------+

推荐阅读