首页 > 解决方案 > 如何大规模比较多个表中的多个字段?

问题描述

我有多个表 T1、T2、...Tn

我想比较 T1、T2 等中的字段,以查看具有相同名称的字段是否具有相同的值。

例如
T1
column_a

T2
列_a

所以

SELECT a.column_a, b.column_a
FROM T1 
JOIN T2 
ON T1.column_a = T2.column_a

很简单,但我怎样才能大规模地做到这一点?Python中有没有办法做到这一点?

例如,使用一些伪代码。

get column1 in T1
if column1 in T2, T3...Tn
compare values and get percentage match

get column2 in T2
if column2 in T2, T3...Tn
compare values and get percentage match

这是我想要的 输出

标签: pythonsqldatabase

解决方案


在 SQL 中,您可以使用union all和聚合:

select num_t1, num_t2, num_t3, count(*),
       min(column_a), max(column_a)
from (select column_a, sum(t1) as num_t1, sum(t2) as num_t2, sum(t3) as num_t3
      from ((select column_a, 1 as t1, 0 as t2, 0 as t3
             from table1
            ) union all
            (select column_a, 0 as t1, 1 as t2, 0 as t3
             from table2
            ) union all
            (select column_a, 0 as t1, 0 as t2, 1 as t3
             from table3
            )
           ) t
     ) t
group by num_t1, num_t2, num_t3;

这比较了值在所有三个(或“n”)表中出现的次数。您可以使用max()sum()不仅仅是查看该值是否存在。


推荐阅读