首页 > 解决方案 > 在 Synapse 中动态创建外部表

问题描述

我有一个 json 字符串,它是数据工厂中 Get Metadata 活动的结果,格式如下:{"structure": [{"name": "Id","type": "String"},{"name": "IsDeleted","type": "Boolean"},{"name": "RowVersion","type": "Byte[]"}, {"name": "ModelId","type": "String"}]

这个 json 显示了 parquet 文件的模式。我将此 json 与 parquet 文件名一起传递uctnow()给一个存储过程,以使用以下代码为该文件创建一个模式表

CREATE OR ALTER PROCEDURE [CreateExternalTables] (
@schema NVARCHAR (MAX), @tableName NVARCHAR(MAX), @ExecTime NVARCHAR(MAX)
) AS

BEGIN
IF OBJECT_ID('tempdb..#tables_to_create', 'U') IS NOT NULL
        DROP TABLE #tables_to_create
CREATE TABLE #tables_to_create (
    tableName NVARCHAR (MAX),
    fieldOrder NVARCHAR (MAX),
    fieldName NVARCHAR (MAX),
    fieldType NVARCHAR (MAX),
    translatedType NVARCHAR (MAX),
    executeTime NVARCHAR (MAX)
)

BEGIN

WITH Fields (fieldOrder, fieldName, fieldType) AS (
  SELECT
    [key] AS fieldOrder,
    JSON_VALUE([value], '$.name') AS fieldName,
    JSON_VALUE([value], '$.type') AS fieldType
  FROM
    OPENJSON(@schema)
)
INSERT INTO
  #tables_to_create(
    tableName,
    fieldOrder,
    fieldName,
    fieldType,
    translatedType,
    executeTime
  )
SELECT
  @tableName,
  fieldOrder,
  fieldName,
  fieldType,
   CASE
    WHEN fieldType = 'Single' THEN 'real'
    WHEN fieldType = 'Boolean' THEN 'bit'
    WHEN fieldType = 'Double' THEN 'float'
    WHEN fieldType = 'Int64' THEN 'bigint'
    ELSE NULL  
  END AS translatedType,
  @ExecTime
FROM
  Fields
END
END;

整体设置如下: 在此处输入图像描述

但我收到以下错误,我不知道为什么: 在此处输入图像描述

我想做的是在存储过程中创建这个 temo 表,以便以自动方式创建外部表。

准确地说,我正在关注此链接: create-external-dataset

非常感谢任何帮助。

更新 在@wBob 的帮助下解决了第一个问题后,我按照链接的其余部分创建外部表,并将以下内容插入到我的存储过程中:

FROM
  Fields
END
Declare @sqlCommand nvarchar(max);
Declare @folderPath nvarchar(max);

SET
  @sqlCommand = 'IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N''[dbo].[' + @tableName + ']'') AND type in (N''U''))
  CREATE EXTERNAL TABLE [dbo].[' + @tableName + '] (' 

WHILE((SELECT COUNT(*) FROM #tables_to_create) > 0)
BEGIN
  DECLARE @key int
  SELECT
    @key = MIN(fieldOrder)
  FROM
    #tables_to_create
  WHERE
    executeTime = @ExecTime
     
  DECLARE @fieldName VARCHAR(50)  
  DECLARE @translatedType VARCHAR(50)
 
  SELECT
    @fieldName = fieldName,
    @translatedType = translatedType
  FROM
    #tables_to_create
  WHERE
    fieldOrder = @key
    AND executeTime = @ExecTime
 
  SET
    @sqlCommand = @sqlCommand + '
          [' + @fieldName + '] [' + @translatedType + '] NULL'
 
  DELETE FROM
    #tables_to_create
  WHERE
    fieldOrder = @key
    AND executeTime = @ExecTime
 
  IF((SELECT COUNT(*) FROM #tables_to_create WHERE executeTime = @ExecTime) > 0)
    SET
      @sqlCommand = @sqlCommand + ', '
END
 
SET
  @sqlCommand = @sqlCommand + '
  )
  WITH
  (
    LOCATION = ''/' + /main/json/ + ''',
    DATA_SOURCE = DataLakeStaged,
    FILE_FORMAT = StagedParquet
  )'
   
EXEC(@sqlCommand)
END;

但是我收到以下错误: 在此处输入图像描述

标签: stored-proceduresazure-data-factoryparquetazure-data-lakeazure-synapse

解决方案


WHILE根据错误消息, Azure Synapse Analytics中的语句条件不支持查询。专用 SQL 池是 MPP 系统,与盒装产品 SQL Server 和 Azure SQL DB 略有不同,更一般地说,循环是一种不能很好地转换的模式。您应该考虑创建一个不循环的存储过程,并将任何循环留给可以并行循环的 Azure 数据工厂 (ADF) / Synapse Pipelines 中的 For Each 活动。

如果您确实需要循环,在线有多个在专用 SQL 池中执行循环的示例,但官方文档中有一个很好的示例,如果链接移动,请在此处复制:

-- First, create a temporary table containing a unique row number used to identify the individual statements:
CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT  ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Sequence
,       [name]
,       'UPDATE STATISTICS '+QUOTENAME([name]) AS sql_code
FROM    sys.tables
;

-- Second, initialize the variables required to perform the loop:
DECLARE @nbr_statements INT = (SELECT COUNT(*) FROM #tbl)
,       @i INT = 1
;


-- Now loop over statements executing them one at a time:
WHILE   @i <= @nbr_statements
BEGIN
    DECLARE @sql_code NVARCHAR(4000) = (SELECT sql_code FROM #tbl WHERE Sequence = @i);
    EXEC    sp_executesql @sql_code;
    SET     @i +=1;
END

推荐阅读