stored-procedures - 在 Synapse 中动态创建外部表
问题描述
我有一个 json 字符串,它是数据工厂中 Get Metadata 活动的结果,格式如下:{"structure": [{"name": "Id","type": "String"},{"name": "IsDeleted","type": "Boolean"},{"name": "RowVersion","type": "Byte[]"}, {"name": "ModelId","type": "String"}]
这个 json 显示了 parquet 文件的模式。我将此 json 与 parquet 文件名一起传递uctnow()
给一个存储过程,以使用以下代码为该文件创建一个模式表
CREATE OR ALTER PROCEDURE [CreateExternalTables] (
@schema NVARCHAR (MAX), @tableName NVARCHAR(MAX), @ExecTime NVARCHAR(MAX)
) AS
BEGIN
IF OBJECT_ID('tempdb..#tables_to_create', 'U') IS NOT NULL
DROP TABLE #tables_to_create
CREATE TABLE #tables_to_create (
tableName NVARCHAR (MAX),
fieldOrder NVARCHAR (MAX),
fieldName NVARCHAR (MAX),
fieldType NVARCHAR (MAX),
translatedType NVARCHAR (MAX),
executeTime NVARCHAR (MAX)
)
BEGIN
WITH Fields (fieldOrder, fieldName, fieldType) AS (
SELECT
[key] AS fieldOrder,
JSON_VALUE([value], '$.name') AS fieldName,
JSON_VALUE([value], '$.type') AS fieldType
FROM
OPENJSON(@schema)
)
INSERT INTO
#tables_to_create(
tableName,
fieldOrder,
fieldName,
fieldType,
translatedType,
executeTime
)
SELECT
@tableName,
fieldOrder,
fieldName,
fieldType,
CASE
WHEN fieldType = 'Single' THEN 'real'
WHEN fieldType = 'Boolean' THEN 'bit'
WHEN fieldType = 'Double' THEN 'float'
WHEN fieldType = 'Int64' THEN 'bigint'
ELSE NULL
END AS translatedType,
@ExecTime
FROM
Fields
END
END;
我想做的是在存储过程中创建这个 temo 表,以便以自动方式创建外部表。
准确地说,我正在关注此链接: create-external-dataset
非常感谢任何帮助。
更新 在@wBob 的帮助下解决了第一个问题后,我按照链接的其余部分创建外部表,并将以下内容插入到我的存储过程中:
FROM
Fields
END
Declare @sqlCommand nvarchar(max);
Declare @folderPath nvarchar(max);
SET
@sqlCommand = 'IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N''[dbo].[' + @tableName + ']'') AND type in (N''U''))
CREATE EXTERNAL TABLE [dbo].[' + @tableName + '] ('
WHILE((SELECT COUNT(*) FROM #tables_to_create) > 0)
BEGIN
DECLARE @key int
SELECT
@key = MIN(fieldOrder)
FROM
#tables_to_create
WHERE
executeTime = @ExecTime
DECLARE @fieldName VARCHAR(50)
DECLARE @translatedType VARCHAR(50)
SELECT
@fieldName = fieldName,
@translatedType = translatedType
FROM
#tables_to_create
WHERE
fieldOrder = @key
AND executeTime = @ExecTime
SET
@sqlCommand = @sqlCommand + '
[' + @fieldName + '] [' + @translatedType + '] NULL'
DELETE FROM
#tables_to_create
WHERE
fieldOrder = @key
AND executeTime = @ExecTime
IF((SELECT COUNT(*) FROM #tables_to_create WHERE executeTime = @ExecTime) > 0)
SET
@sqlCommand = @sqlCommand + ', '
END
SET
@sqlCommand = @sqlCommand + '
)
WITH
(
LOCATION = ''/' + /main/json/ + ''',
DATA_SOURCE = DataLakeStaged,
FILE_FORMAT = StagedParquet
)'
EXEC(@sqlCommand)
END;
解决方案
WHILE
根据错误消息, Azure Synapse Analytics中的语句条件不支持查询。专用 SQL 池是 MPP 系统,与盒装产品 SQL Server 和 Azure SQL DB 略有不同,更一般地说,循环是一种不能很好地转换的模式。您应该考虑创建一个不循环的存储过程,并将任何循环留给可以并行循环的 Azure 数据工厂 (ADF) / Synapse Pipelines 中的 For Each 活动。
如果您确实需要循环,在线有多个在专用 SQL 池中执行循环的示例,但官方文档中有一个很好的示例,如果链接移动,请在此处复制:
-- First, create a temporary table containing a unique row number used to identify the individual statements:
CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Sequence
, [name]
, 'UPDATE STATISTICS '+QUOTENAME([name]) AS sql_code
FROM sys.tables
;
-- Second, initialize the variables required to perform the loop:
DECLARE @nbr_statements INT = (SELECT COUNT(*) FROM #tbl)
, @i INT = 1
;
-- Now loop over statements executing them one at a time:
WHILE @i <= @nbr_statements
BEGIN
DECLARE @sql_code NVARCHAR(4000) = (SELECT sql_code FROM #tbl WHERE Sequence = @i);
EXEC sp_executesql @sql_code;
SET @i +=1;
END
推荐阅读
- excel - VBA:转到 IE 站点并按元素打勾
- amazon-web-services - 如何使用来自 ACM 的 AWS 证书使用亚马逊 EC2 从端口 80 重定向到 443?
- ruby-on-rails - 在 Rails6 中使用多个数据库以及切换数据库的问题
- django - 带有post方法的Django ListView,get_queryset没有过滤
- sql - 通过 postgresql 插入表中的唯一行
- python - 在 Python 中映射来自两个集合的值对时如何生成唯一 ID
- node.js - 通过 Node 在 Jest 调试中包含项目目录
- swiftui - 视图不会在嵌套 ForEach 循环中重新呈现
- regex - 如果条件,bash中的奇怪正则表达式行为
- reactjs - 将动态道具传递给延迟加载组件?