首页 > 解决方案 > 使用 PostgreSQL 中其他表的随机值生成表

问题描述

以下代码使用 uuid_generate_v4() 为 Id 列生成 100000 行随机值。但是,嵌套选择始终选择同一行,因此所有插入的行对于这些列具有相同的值。目标是创建一个包含 100k 行的表,其中随机值取自其他样本表。每个示例表只有两列(Id 和从中获取值的列)。这个怎么存档?

insert into "Tag" (
    "Id", "Time", "Account", "Name", "Value", "RollUpTableId"
)
select
    uuid_generate_v4(),
    current_timestamp,
    (select "Account" from "AccountSamples" OFFSET floor(random()*358) LIMIT 1),
    (select "Name" from "TagNameSamples" OFFSET floor(random()*19) LIMIT 1),
    (select "Value" from "TagValueSamples" OFFSET floor(random()*26) LIMIT 1),
    uuid_generate_v4()
from generate_series(1, 100000);

我也试过从“AccountSamples”中选择“Account”,其中“Id”=(trunc(random()* 358)::integer)

标签: sqlpostgresqlrandomsubquerywindow-functions

解决方案


很可能,Postgres 正在优化子查询,并且不会为每一行重新执行它们。

我建议在子查询中随机枚举,然后加入:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from (
    select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a
inner join (
    select "Name",    row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = a.rn
inner join (
    select "Value",   row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = a.rn
where a.rn <= 10

这与原始查询中的逻辑不完全相同,因为给定的行可能只能选择一次 - 但我认为这是一个合理的近似值。

如果您的某些表可能有超过 10 行,那么generate_series()left joins 更安全:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from generate_series(1, 10) x(rn)
left join (
    select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a on a.rn = x.rn
left join (
    select "Name",    row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = x.rn
left join (
    select "Value",   row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = x.rn

推荐阅读