google-bigquery - 运行 SQL 执行即时查询以在 python 函数内的 Bigquery 中合并表
问题描述
希望您能帮助我理解,我正在尝试运行我在 Python 中构建的执行即时查询,这非常复杂。因此,我每次都可以轻松更改参数。但是 Python 在抱怨
BadRequest: 400 Query error: Scalar subquery produced more than one element at [1:19]
你能帮我确定发生了什么吗?
该查询帮助我根据第一个数据帧的 ID 在第二个数据帧中运行数据透视表
def aggre_secondary_data(dataset, table, table2, uniqid, date_column, brand_column,
var_agg=[], agg= "sum",
date_format='%m'):
table_name = 'project.'+ dataset + '.' + table
table_name_2 = 'project.'+ dataset + '.' + table2
query0 = "SELECT * FROM (SELECT " + uniqid + " , " + brand_column + " || '_' || as
brand_month, " +", ".join(var_agg)
query1 = " FROM `%s`"%(table_name_2) + "\n" + "WHERE " + uniqid + " IN (SELECT DISTINCT
CAST(" + uniqid + " as STRING) as " + uniqid + "\n"
query2 = "FROM `%s`"%(table_name) + "as reduced_df) " + "\n" + ")"
query3 = query0 + query1 + query2
pivot_f = "\n"+ "PIVOT "
temp =''
for i in var_agg:
temp = temp + (" " + agg + "(" + i + ") " + i )
pivot = "(" + temp + " FOR brand_month IN (" + """'""" +'''"""''' + " ||
string_agg(brand_month,"
pivot2 = pivot + ''' "','" ''' + " )" + """ || """ + '''"""''' + "'" + "))"
+"\n"+ " ORDER BY " + uniqid
query_brand_month = "SELECT " + brand_column + " || '_' || Month brand_month"
query_date = "SELECT DISTINCT Month FROM `%s`"%(table_name_2)
query_brands = "SELECT DISTINCT " + brand_column + " FROM `%s`"%(table_name_2)
final_query = """SELECT '''""" + query3 + pivot_f+ pivot2 + '\n' + """ ''' """ +"\n"+
"FROM (" +"\n"+ query_brand_month + "\n"+ " FROM (" + query_brands +")" +"\n"+ "CROSS JOIN
("
+ query_date +" ) " +"\n"+ "ORDER BY brand_month )"
return final_query
def execute_query(dataset,table,table2):
client = bigquery.Client()
job_config = bigquery.QueryJobConfig(use_legacy_sql=False)
temp = 'execute immediate ('
main_query = aggre_secondary_data("blue_test","data1","test", uniqid = "ID", date_column = "Month" ,
brand_column ="PRODUCT_" , var_agg=["P","R"], agg= "sum", date_format='%m')
sql= temp + main_query + ")"
query_job = client.query(sql, job_config=job_config)
df = query_job.result().to_dataframe()
return df
execute_query(dataset="blue_test",table="data1",table2="test")
我知道如果我在 python 中运行查询,我会这样做:
query_4 = f"""\
#standardsql
execute immediate (
SELECT .......
'
)
"""
df_final = bqclient.query(query).result().to_dataframe()
这是打印出来的 final_query:
SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) Month brand_month, P, R FROM `project.test_string_date`
WHERE ID IN (SELECT DISTINCT CAST(ID as INT) as ID
FROM `project.blue_test.data1`as reduced_df)
)
PIVOT ( sum(P) P, sum(R) R FOR brand_month IN ("''' || string_agg(brand_month, '","' ) || '''"))
ORDER BY ID
'''
FROM (
SELECT PRODUCT || '_' || Month brand_month
FROM (SELECT DISTINCT PRODUCT FROM `project.blue_test.test_string_date`)
CROSS JOIN (SELECT DISTINCT EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) as Month FROM `project.blue_test.test_string_date` )
ORDER BY brand_month )
这导致了这个查询:
SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) as Month, brand_month, P, R FROM `project.test_string_date`
WHERE ID IN (SELECT DISTINCT CAST(ID as INT) as ID
FROM `project.blue_test.data1`as reduced_df)
)
PIVOT ( sum(P) P, sum(R) R FOR brand_month IN ("a_1","a_10","a_2","a_3","a_4","a_5","a_6","a_7","a_8","a_9","b_1","b_10","b_2","b_3","b_4","b_5","b_6","b_7","b_8","b_9","c_1","c_10","c_2","c_3","c_4","c_5","c_6","c_7","c_8","c_9","d_1","d_10","d_2","d_3","d_4","d_5","d_6","d_7","d_8","d_9"))
ORDER BY ID
它抱怨品牌月不应该在那里。我不明白为什么
这是显示的错误:
Unrecognized name: brand_month at [1:117]
这是我的第二张桌子的格式
ID 日期 产品公关
1234 01/05/2020 00:00 一 34 12
1233 01/03/2020 00:00 b 54 0
解决方案
这是打印出来的 final_query:
SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) Month brand_month, P, R FROM `project.blue_test.test_string_date`
...
...
它抱怨品牌月不应该在那里。我不明白为什么
Month
您应该在之前删除brand_month
,所以第一行应该看起来像
SELECT '''SELECT * FROM (SELECT ID , PRODUCT || '_' || EXTRACT(month FROM PARSE_DATETIME( '%d/%m/%Y %H:%M', Date )) brand_month, P, R FROM `project.blue_test.test_string_date`
...
...
推荐阅读
- django - django,添加密码以限制谁可以在我的网站上注册
- r - 如何将希腊字母添加到 Facet_Grid 条形标签?
- php - 在 mySQL 中管理 ID
- flutter - 类型“_TypeError”不是“bool”类型的子类型
- reactjs - 在 NextJS 中通过 window.postMessage 获取跨窗口消息错误
- javascript - 从Javascript中的字符串中删除短语
- python - 如何将 TrueType 字体文件添加到 pyinstaller 可执行文件以与 Pygame 一起使用?
- styled-components - 子组件功能不更新 DOM
- python-3.x - Python ctypes winapi GetUserProfileDirectory 访问冲突读取
- c++ - 了解 clang 循环优化