首页 > 解决方案 > 如何以有效的方式使用 Python 自动化 SQL 代码

问题描述

我有一个如下所示的 sql 代码: 数据库:RedShift

WITH
X as
(  
 SELECT  distinct pn , pg , ic, sr  , cm, fq , m1  , m2  , m3  , m4
 FROM table1  ORDER BY 1,2,3
),

table2 AS
 (
   Select g,p,t , avg(ss) as ss , avg(ce) as ce , sum(av) as ps
  from
 (
select distinct ic AS g , pn AS p , cm AS t , ss , cast((sum_m1/nullif(sum_m2,0)) as decimal(3,2)) as 
 ce , av
from
(
  select *
  , cast((sum(m3) over (partition by ic, pn,cm)) as decimal) as ss
  , sum(m1) over (partition by ic, pn,cm) as sum_m1
  , sum(m2) over (partition by ic, pn,cm) as sum_m2
  , cast((avg(m2) over (partition by ic, pn,cm)) as decimal) as av
  from X
  ORDER BY 1,2,3
)
order by 1,2,3
 )
 where ss is not null
  group by 1,2,3
  order by 1,2,3
  )

按值分组g,p,t每次都会更改,因此它为每个新的g,p,t值组合创建表。

自动化的一种方法是在 Python 中转储此 sql 代码,这可能效率低下:这是我的方法->我用{}大括号替换列表中的所有值示例说:

我将所有可能的分组值存储在一个列表中。

G=[g1,g2,g3]

P=[p1,p2,p3]

T=[t1,t2,t3]

连接到数据库:

   c= psycopg2.connect(database=db,host=host,port=port,user=user,password=password,sslmode='require')
   data2={}

   for g in G:

     for p in P:

        for t in T:
           sqlstr=( """ WITH
                            X as
                          (  
                 SELECT  distinct pn , pg , ic, sr  , cm, fq , m1  , m2  , m3  , m4
              FROM table1  ORDER BY 1,2,3
                        ),

                  table2 AS
              Select {},{},{} , avg(ss) as ss , avg(ce) as ce , sum(av) as ps
                             from
                         (
                    select distinct ic AS g , pn AS p , cm AS t , ss , cast((sum_m1/nullif(sum_m2,0)) as decimal(3,2)) as 
                      ce , av
                          from
                  (
                       select *
               , cast((sum(m3) over (partition by ic, pn,cm)) as decimal) as ss
                , sum(m1) over (partition by ic, pn,cm) as sum_m1
               , sum(m2) over (partition by ic, pn,cm) as sum_m2
               , cast((avg(m2) over (partition by ic, pn,cm)) as decimal) as av
                      from X
                     ORDER BY 1,2,3
                       )
                   order by 1,2,3
                    )
                 where ss is not null
                   group by 1,2,3
                   order by 1,2,3
                    ), select * from table2 """).format(g,p,t)
  data2[g+"_"+p+"_"+t] = pd.read_sql_query(sqlstr, c)

有没有更好的方法来传递参数,如上面的 {} 代码序列应该保持按顺序传递参数?

我们可以使用 SQL 以外的其他方法吗?以pythonic方式?

标签: python-3.xpandasgroup-by

解决方案


推荐阅读