首页 > 解决方案 > Python BigQuery:插入内容为数字的字符串失败

问题描述

我正在尝试创建一些将数据插入 Google Big Query 的代码,但无法确定 String 字段的内容到底是什么。显然,Big Query 存在“1.1”或“1”等字符串的问题

考虑以下最小架构的 Big Query 表(只有一个名为“stringer”的字符串字段:

[
  {
    "description": "string_debug",
    "mode": "NULLABLE",
    "name": "stringer",
    "type": "STRING"
  }
]
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'bqsoba'
table_id = 'stringer'

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = False

job = client.load_table_from_json([{'stringer':'1'}], table_ref, job_config=job_config)

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

导致异常:

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
<ipython-input-46-3d68dddf9573> in <module>
     13 job = client.load_table_from_json(parsed[11:20], table_ref, job_config=job_config)
     14 
---> 15 job.result()  # Waits for table load to complete.
     16 
     17 print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

~/apps/conda2019/lib/python3.7/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
    777             self._begin(retry=retry)
    778         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 779         return super(_AsyncJob, self).result(timeout=timeout)
    780 
    781     def cancelled(self):

~/apps/conda2019/lib/python3.7/site-packages/google/api_core/future/polling.py in result(self, timeout)
    125             # pylint: disable=raising-bad-type
    126             # Pylint doesn't recognize that this is valid in this case.
--> 127             raise self._exception
    128 
    129         return self._result

BadRequest: 400 Provided Schema does not match Table bi-project-231313:bqsoba.sa_hardware_collector. Field sysconfig.call_home_token has changed type from STRING to INTEGER

是否可以要求大查询“插入”将“1”视为字符串值?

标签: pythongoogle-bigquery

解决方案


您可以手动定义架构:

schema =   [{
    "mode": "NULLABLE",
    "name": "stringer",
    "type": "STRING"
  }]

然后在作业执行之前使用它:

job_config.schema = schema

希望对你有帮助


推荐阅读