首页 > 解决方案 > 运行 SQL 执行即时查询以在 python 函数内的 Bigquery 中合并表

问题描述

希望您能帮助我理解,我正在尝试运行我在 Python 中构建的执行即时查询,这非常复杂。因此,我每次都可以轻松更改参数。但是 Python 在抱怨

BadRequest: 400 Query error: Scalar subquery produced more than one element at [1:19]

你能帮我确定发生了什么吗?

该查询帮助我根据第一个数据帧的 ID 在第二个数据帧中运行数据透视表

        def aggre_secondary_data(dataset, table, table2, uniqid, date_column, brand_column, 
                                                                       var_agg=[], agg= "sum",  

        date_format='%m'):
        table_name = 'project.'+ dataset + '.' + table
        table_name_2 = 'project.'+ dataset + '.' + table2
            
        query0 = "SELECT * FROM (SELECT " + uniqid + " , " + brand_column + " || '_' || as 
         brand_month, " +", ".join(var_agg)
        query1 = " FROM `%s`"%(table_name_2) + "\n" + "WHERE " + uniqid + " IN (SELECT DISTINCT 
        CAST(" + uniqid + " as STRING) as " +  uniqid + "\n"
        query2 = "FROM `%s`"%(table_name) + "as reduced_df) " + "\n" + ")" 
        
        query3 = query0 + query1 + query2 
        
        pivot_f = "\n"+ "PIVOT "
        
        temp =''
        for i in var_agg:
            temp = temp + (" " + agg + "(" + i + ") " + i )
            pivot = "(" + temp + " FOR brand_month IN (" + """'"""  +'''"""''' +  " ||  
              string_agg(brand_month,"
            pivot2 = pivot +  ''' "','" '''  +  " )" + """ || """ +  '''"""''' + "'"      + "))" 
           +"\n"+ " ORDER BY " + uniqid
        
        query_brand_month = "SELECT " + brand_column + " || '_' || Month brand_month"
        query_date = "SELECT DISTINCT Month FROM `%s`"%(table_name_2)
        query_brands = "SELECT DISTINCT " + brand_column +  " FROM `%s`"%(table_name_2)
           
        final_query = """SELECT '''"""  + query3 + pivot_f+ pivot2 + '\n' + """ ''' """ +"\n"+ 
        "FROM (" +"\n"+ query_brand_month + "\n"+ " FROM (" + query_brands +")" +"\n"+ "CROSS JOIN 
        (" 
        + query_date +" ) " +"\n"+ "ORDER BY brand_month )"
            
        return final_query
    
    def execute_query(dataset,table,table2):
        client = bigquery.Client()
        job_config = bigquery.QueryJobConfig(use_legacy_sql=False)
        
        temp = 'execute immediate ('
        main_query = aggre_secondary_data("blue_test","data1","test", uniqid = "ID", date_column = "Month" , 
                         brand_column ="PRODUCT_" , var_agg=["P","R"], agg= "sum", date_format='%m')
        sql= temp + main_query + ")"
        query_job = client.query(sql, job_config=job_config)
        df = query_job.result().to_dataframe()
        return df
    
    execute_query(dataset="blue_test",table="data1",table2="test")

我知道如果我在 python 中运行查询,我会这样做:

    query_4 = f"""\
    #standardsql      
    execute immediate ( 
    
    SELECT .......
    '
    )
    """
    df_final = bqclient.query(query).result().to_dataframe()

这是打印出来的 final_query:

    SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) Month brand_month, P, R FROM `project.test_string_date`
WHERE ID IN (SELECT DISTINCT CAST(ID as INT) as ID
FROM `project.blue_test.data1`as reduced_df) 
)
PIVOT ( sum(P) P, sum(R) R FOR brand_month IN ("''' ||  string_agg(brand_month, '","' ) || '''"))
 ORDER BY ID
 ''' 
FROM (
SELECT PRODUCT || '_' || Month brand_month
 FROM (SELECT DISTINCT PRODUCT FROM `project.blue_test.test_string_date`)
CROSS JOIN (SELECT DISTINCT EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) as Month FROM `project.blue_test.test_string_date` ) 
ORDER BY brand_month )

这导致了这个查询:

SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) as Month, brand_month, P, R FROM `project.test_string_date`
WHERE ID IN (SELECT DISTINCT CAST(ID as INT) as ID
FROM `project.blue_test.data1`as reduced_df) 
)
PIVOT ( sum(P) P, sum(R) R FOR brand_month IN ("a_1","a_10","a_2","a_3","a_4","a_5","a_6","a_7","a_8","a_9","b_1","b_10","b_2","b_3","b_4","b_5","b_6","b_7","b_8","b_9","c_1","c_10","c_2","c_3","c_4","c_5","c_6","c_7","c_8","c_9","d_1","d_10","d_2","d_3","d_4","d_5","d_6","d_7","d_8","d_9"))
 ORDER BY ID

它抱怨品牌月不应该在那里。我不明白为什么

这是显示的错误:

Unrecognized name: brand_month at [1:117]

这是我的第二张桌子的格式

ID 日期 产品公关

1234 01/05/2020 00:00 一 34 12

1233 01/03/2020 00:00 b 54 0

标签: google-bigquery

解决方案


这是打印出来的 final_query:

SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) Month brand_month, P, R FROM `project.blue_test.test_string_date`    
...    
...    

它抱怨品牌月不应该在那里。我不明白为什么

Month您应该在之前删除brand_month,所以第一行应该看起来像

SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) brand_month, P, R FROM `project.blue_test.test_string_date`    
...    
...    

推荐阅读