sql-server - 哈希键列的隐式转换导致插入速度非常慢
问题描述
我正在尝试将一些数据插入到一些插入时间太长的阶段表中。例如,一个包含多达 600000 条记录的表需要将近一个小时才能完成。在查询的选择部分,我们正在创建列的哈希,稍后用于更改检测。由于我们使用的数据仓库方法 DataVault,我们无法删除更改哈希。当我查看执行计划时,我看到表达式中的类型转换警告。示例如下所示。
<Warnings>
<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(max),[load].[load_acbs_loan_product_dimension].[product_month],0)" />
我还包含了我们正在使用的插入语句的示例:
INSERT INTO [stage_acbs_balance_category_dimension]
( hk_h_balance_category_dimension
, balance_category_key
, balance_category_code
, balance_category_description
, balance_class_code
, balance_class_description
, user_define_code_1
, user_define_code_2
, user_define_code_3
, user_define_code_4
, user_define_code_5
, bal_cat_short_name
, include_bal_in_tax_reporting
, include_in_billings_statements
, include_in_past_due_reporting
, dss_change_hash_acbs_balance_category_dimension_lroc
, dss_record_source
, dss_load_date
, dss_create_time)
SELECT CAST(HASHBYTES('sha2_256',
COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_key AS VARCHAR(MAX)),'null')
) AS BINARY(32)) AS hk_h_balance_category_dimension
, load_acbs_balance_category_dimension.balance_category_key AS balance_category_key
, load_acbs_balance_category_dimension.balance_category_code AS balance_category_code
, load_acbs_balance_category_dimension.balance_category_description AS balance_category_description
, load_acbs_balance_category_dimension.balance_class_code AS balance_class_code
, load_acbs_balance_category_dimension.balance_class_description AS balance_class_description
, load_acbs_balance_category_dimension.user_define_code_1 AS user_define_code_1
, load_acbs_balance_category_dimension.user_define_code_2 AS user_define_code_2
, load_acbs_balance_category_dimension.user_define_code_3 AS user_define_code_3
, load_acbs_balance_category_dimension.user_define_code_4 AS user_define_code_4
, load_acbs_balance_category_dimension.user_define_code_5 AS user_define_code_5
, load_acbs_balance_category_dimension.bal_cat_short_name AS bal_cat_short_name
, load_acbs_balance_category_dimension.include_bal_in_tax_reporting AS include_bal_in_tax_reporting
, load_acbs_balance_category_dimension.include_in_billings_statements AS include_in_billings_statements
, load_acbs_balance_category_dimension.include_in_past_due_reporting AS include_in_past_due_reporting
, CAST(HASHBYTES('SHA2_256',
COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_code AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_description AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.balance_class_code AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.balance_class_description AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_1 AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_2 AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_3 AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_4 AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_5 AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.bal_cat_short_name AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.include_bal_in_tax_reporting AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.include_in_billings_statements AS VARCHAR(MAX)),'null') +'||'+
COALESCE(CAST(load_acbs_balance_category_dimension.include_in_past_due_reporting AS VARCHAR(MAX)),'null')
) AS BINARY(32)) AS dss_change_hash_acbs_balance_category_dimension_lroc
, load_acbs_balance_category_dimension.dss_record_source AS dss_record_source
, load_acbs_balance_category_dimension.dss_load_date AS dss_load_date
, getdate() AS dss_create_time
FROM [load_acbs_balance_category_dimension] load_acbs_balance_category_dimension
我正在寻找一种方法来摆脱隐式转换并让插入执行。我最初想让列成为计算的持久散列,但散列是不确定的。任何想法将不胜感激。
解决方案
我选择将 VARCHAR(MAX) 更改为 VARBINARY(MAX,这极大地提高了性能,从 1 小时缩短到 40 秒。我还希望更改散列算法。
推荐阅读
- android - 使用不支持多语言环境的 travis 构建应用程序(语言)
- javascript - JavaScript:对等连接上的黑屏
- r - ggplot2 barplot R中垂直条的轻微错位
- android - 如何在kotlin中将Unit转换为String(添加同步代码后UPD不起作用)
- powerapps - PowerApps 构建器未反映对自定义连接器所做的更新
- php - php json_encode 返回“??????” 而不是Unicode?
- python - 使用 pyarrow 将标量值从记录批次转换为十六进制字符串
- vue.js - 如何在nuxt js中动态头部
- javascript - 如何使用 javascript 解码 json 循环数据
- list - Spring Cloud Stream - 一对多转换