首页 > 解决方案 > 雪花 JSON 解析在数组(列表)中没有“键”的情况下展平项目

问题描述

我正在尝试解析来自 AWS 文本的 JSON 提取。

{
  "BlockType": "CELL",
  "ColumnIndex": 2,
  "ColumnSpan": 1,
  "Confidence": 99.87053680419922,
  "Geometry": {
    "BoundingBox": {
      "Height": 0.023682409897446632,
      "Left": 0.10847073048353195,
      "Top": 0.6578078269958496,
      "Width": 0.18180404603481293
    },
    "Polygon": [
      {
        "X": 0.10847073048353195,
        "Y": 0.6578078269958496
      },
      {
        "X": 0.2902747690677643,
        "Y": 0.6578078269958496
      },
      {
        "X": 0.2902747690677643,
        "Y": 0.6814902424812317
      },
      {
        "X": 0.10847073048353195,
        "Y": 0.6814902424812317
      }
    ]
  },
  "Id": "8f1d94e9-f936-4ea9-bae8-b3e0347c5411",
  "Page": 1,
  "Relationships": [
    {
      "Ids": [
        "5b7b930f-5715-4fa6-b85b-9e2253e2b389",
        "52e9d462-5aa2-4df2-9a33-3b59e03ea256",
        "92d278f8-2d44-41ce-8cfd-35feccfba29d",
        "69121af9-3270-488b-9335-70359859f9d6",
        "827011b6-fc8c-4749-89bd-91f376eaa554",
        "dd001bd6-f146-41e0-b5b3-6afb2bf37f17",
        "a4286c8f-f7a2-4b7d-84b0-e38b7cd93225"
      ],
      "Type": "CHILD"
    }
  ],
  "RowIndex": 1,
  "RowSpan": 1
}

当我尝试使用以下方法进行展平时:

lateral flatten(input => SOURCE_CONTENTS:Relationships[0]:Ids , MODE=> 'BOTH') where SOURCE_CONTENTS:BlockType = 'CELL'

where 语句返回 Id 的完整列表。所有 7 人每行重复 7 次。我想得到数组中的第一个、第二个、第三个项目。每行 1 个。

26
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
27
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
28
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
29
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
30
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
31
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...
32
[ { "Ids": [ "5b7b930f-5715-4fa6-b85b-9e2253e2b389", "52e9d462-5aa2-4df2-9a33-3b59e03ea256", "92d278f8-2d44-41ce-8cfd-35feccfba29d", "69121af9-3270-488b-9335-70359859f9d6", "827011b6-fc8c-4749-89bd-91f376eaa554", ...

预期输出:

'5b7b930f-5715-4fa6-b85b-9e2253e2b389'

'52e9d462-5aa2-4df2-9a33-3b59e03ea256'

'92d278f8-2d44-41ce-8cfd-35feccfba29d'

'69121af9-3270-488b-9335-70359859f9d6'

'827011b6-fc8c-4749-89bd-91f376eaa554'

编辑: 既然你问了输出。我们真正希望它能够重新创建从 Snowflake 中的 AWS 文本提取的表。这只是第一部分。获取单元格内容以返回与单元格关联的 ID。然后我们可以查找单元格的内容并重建表格。如您所见,“CELL”块包含列和行引用以及指向单元格内容的子关系。

我试图把这个输出放在一个包含这个孩子列表的 where 语句中,这样我们就可以得到与他们相关的单词。那可能是错误的做法。

例如:如果子值 '861db385-0764-4fef-a4c1-f354a5e08fce' 采用以下格式。我可以提取单元格的内容,包括与单元格关联的单词。

SELECT SOURCE_CONTENTS, SOURCE_CONTENTS:Text, SOURCE_CONTENTS:Id ,SNIPPET_ORDER
FROM temp_table
WHERE SOURCE_CONTENTS:Id IN ('5b7b930f-5715-4fa6-b85b-9e2253e2b389',
        '52e9d462-5aa2-4df2-9a33-3b59e03ea256',
        '92d278f8-2d44-41ce-8cfd-35feccfba29d',
        '69121af9-3270-488b-9335-70359859f9d6',
        '827011b6-fc8c-4749-89bd-91f376eaa554',
        'dd001bd6-f146-41e0-b5b3-6afb2bf37f17',
        'a4286c8f-f7a2-4b7d-84b0-e38b7cd93225' ) // Child values
;

标签: arraysjsonlistparsingsnowflake-cloud-data-platform

解决方案


您没有为其他人提供足够的上下文(表定义、准确的输出等)来提供全面的解决方案。我相信这(或类似的东西)可能是你正在寻找的东西,但不确定......

WITH MY_TABLE AS (
  SELECT PARSE_JSON($1) AS SOURCE_CONTENTS
    FROM VALUES ($$
{
  "BlockType": "CELL",
  "ColumnIndex": 2,
  "ColumnSpan": 1,
  "Confidence": 99.87053680419922,
  "Geometry": {
    "BoundingBox": {
      "Height": 0.023682409897446632,
      "Left": 0.10847073048353195,
      "Top": 0.6578078269958496,
      "Width": 0.18180404603481293
    },
    "Polygon": [
      {
        "X": 0.10847073048353195,
        "Y": 0.6578078269958496
      },
      {
        "X": 0.2902747690677643,
        "Y": 0.6578078269958496
      },
      {
        "X": 0.2902747690677643,
        "Y": 0.6814902424812317
      },
      {
        "X": 0.10847073048353195,
        "Y": 0.6814902424812317
      }
    ]
  },
  "Id": "8f1d94e9-f936-4ea9-bae8-b3e0347c5411",
  "Page": 1,
  "Relationships": [
    {
      "Ids": [
        "5b7b930f-5715-4fa6-b85b-9e2253e2b389",
        "52e9d462-5aa2-4df2-9a33-3b59e03ea256",
        "92d278f8-2d44-41ce-8cfd-35feccfba29d",
        "69121af9-3270-488b-9335-70359859f9d6",
        "827011b6-fc8c-4749-89bd-91f376eaa554",
        "dd001bd6-f146-41e0-b5b3-6afb2bf37f17",
        "a4286c8f-f7a2-4b7d-84b0-e38b7cd93225"
      ],
      "Type": "CHILD"
    }
  ],
  "RowIndex": 1,
  "RowSpan": 1
}
    $$)
)
SELECT ARRAY_AGG(F.VALUE) AS Relationship_Ids
  FROM MY_TABLE T
      ,LATERAL FLATTEN(SOURCE_CONTENTS:"Relationships"[0]:"Ids") F
 WHERE SOURCE_CONTENTS:"BlockType"::VARCHAR = 'CELL'
 GROUP BY F.SEQ
;

推荐阅读