首页 > 解决方案 > BigQuery 更新失败,但仅在使用 Python API 进行批处理时

问题描述

我正在尝试使用批量更新语句更新表。DML 查询在 BigQuery Web UI 中成功执行,但在批处理时,第一个查询成功,而其他查询则失败。为什么是这样?

示例查询:

query = '''
update `project.dataset.Table`
set my_fk = 1234
where other_fk = 222 and
  received >= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-22 05:28:12") and 
  received <= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-26 02:31:51")
'''

示例代码:

job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH

queries = [] # list of DML Strings
jobs = []
for query in queries:
    job = client.query(query, location='US', job_config=job_config)
    jobs.append(job)

作业输出:

for job in jobs[1:]:
    print(job.state)
    # Done

    print(job.error_result)
    # {'message': 'Cannot set destination table in jobs with DML statements',
    # 'reason': 'invalidQuery'}

    print(job.use_legacy_sql)
    # False

    print(job.job_type)
    # Query

标签: google-bigquery

解决方案


我怀疑问题是在插入第一个作业后,BigQuery APIjob_config填充了一些字段(destination特别是)。然后,第二个作业将失败,因为它将是作业配置中带有目标表的 DML 语句。您可以通过以下方式验证:

for query in queries:
    print(job_config.destination)
    job = client.query(query, location='US', job_config=job_config)
    print(job_config.destination)
    jobs.append(job)

要解决此问题,您可以避免job_config对所有作业重复使用相同的内容:

for query in queries:
    job_config = bigquery.QueryJobConfig()
    job_config.priority = bigquery.QueryPriority.BATCH
    job = client.query(query, location='US', job_config=job_config)
    jobs.append(job)

推荐阅读