首页 > 解决方案 > SQL Server 生成唯一的随机字符串

问题描述

在Zohar Answer的帮助下,我得到了生成随机字符串的 SQL 函数,但我面临重复的问题。

询问

Create FUNCTION [dbo].[MaskGenerator]
(    
    @Prefix nvarchar(4000), -- use null or an empty string for no prefix    
    @suffix nvarchar(4000), -- use null or an empty string for no suffix    
    @MinLength int, -- the minimum length of the random part    
    @MaxLength int, -- the maximum length of the random part    
    @Count int, -- the maximum number of rows to return. Note: up to 1,000,000 rows           
    @CharType tinyint -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
                      -- a bitwise combination of these values can be used to generate all possible combinations: 
                      -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
)
RETURNS TABLE
AS 
RETURN 

-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000 

SELECT TOP(@Count)  N As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN LOWER
                    WHEN 2 THEN UPPER
                    WHEN 3 THEN IIF(Rnd % 2 = 0, LOWER, UPPER)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, LOWER, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, UPPER, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN LOWER
                            WHEN 1 THEN UPPER
                            ELSE Digit
                        END
                END
        FROM Tally As T0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT ABS(CHECKSUM(NewGuid)) As Rnd FROM GuidGenerator) AS RAND
        CROSS APPLY
        (
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As LOWER, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As UPPER,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) AS Chars
        WHERE  T0.N <> -T1.N -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As T1 
CROSS APPLY
(
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 N As Length
    FROM Tally As T2
    CROSS JOIN GuidGenerator 
    WHERE T2.N >= @MinLength
    AND T2.N <= @MaxLength
    AND T2.N <> t1.N
    ORDER BY NewGuid
) As Lengths;

上述函数将根据其参数提供随机字符串。例如下面的查询将生成 100 个随机字符串,格式为 Test_Product_。结果集具有需要忽略的重复值。我已经尝试应用 row_number 但它会降低查询性能并且请求计数不会到来。

SELECT * FROM dbo.MaskGenerator('Test_Product_',null,1,4,100,4) ORDER BY 2

我在这里做了小提琴演示:SQL Fiddle和我的尝试也在这里

标签: sqlsql-serveruniquedistinct

解决方案


基本上,这是生日问题的影响。
到目前为止,我能提供的最佳解决方案是生成两倍数量的随机字符串,然后从中选择前 100 个不同的值:

SELECT TOP 100 RandomString, ROW_NUMBER() OVER(ORDER BY @@SPID) As Number
FROM 
(
  SELECT DISTINCT RandomString 
  FROM dbo.MaskGenerator('Test_Product_',null,1,4,200,4)
) As Rnd
ORDER BY RandomString

这可能看起来像一个腰,因为您生成的随机字符串是您需要的两倍,但是:

  1. 我不确定事实是否如此。一旦您有 100 个不同的值,查询优化器可能会停止执行。

  2. 我对此函数进行的性能测试(在相对强大的 SQL Server 2016 上)表明它快如闪电,至少使用少量字符串:

    • 生成 200 个字符串平均大约需要 23 毫秒。
    • 生成 2000 个字符串平均大约需要 55 毫秒。
    • 生成 100,000 个字符串平均大约需要 2.8 秒。

然而,生成 100 万个字符串平均大约需要 30 秒。


推荐阅读