python - Python BigQuery:插入内容为数字的字符串失败
问题描述
我正在尝试创建一些将数据插入 Google Big Query 的代码,但无法确定 String 字段的内容到底是什么。显然,Big Query 存在“1.1”或“1”等字符串的问题
考虑以下最小架构的 Big Query 表(只有一个名为“stringer”的字符串字段:
[
{
"description": "string_debug",
"mode": "NULLABLE",
"name": "stringer",
"type": "STRING"
}
]
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'bqsoba'
table_id = 'stringer'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = False
job = client.load_table_from_json([{'stringer':'1'}], table_ref, job_config=job_config)
job.result() # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))
导致异常:
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
<ipython-input-46-3d68dddf9573> in <module>
13 job = client.load_table_from_json(parsed[11:20], table_ref, job_config=job_config)
14
---> 15 job.result() # Waits for table load to complete.
16
17 print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))
~/apps/conda2019/lib/python3.7/site-packages/google/cloud/bigquery/job.py in result(self, timeout, retry)
777 self._begin(retry=retry)
778 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 779 return super(_AsyncJob, self).result(timeout=timeout)
780
781 def cancelled(self):
~/apps/conda2019/lib/python3.7/site-packages/google/api_core/future/polling.py in result(self, timeout)
125 # pylint: disable=raising-bad-type
126 # Pylint doesn't recognize that this is valid in this case.
--> 127 raise self._exception
128
129 return self._result
BadRequest: 400 Provided Schema does not match Table bi-project-231313:bqsoba.sa_hardware_collector. Field sysconfig.call_home_token has changed type from STRING to INTEGER
是否可以要求大查询“插入”将“1”视为字符串值?
解决方案
您可以手动定义架构:
schema = [{
"mode": "NULLABLE",
"name": "stringer",
"type": "STRING"
}]
然后在作业执行之前使用它:
job_config.schema = schema
希望对你有帮助
推荐阅读
- reactjs - 带有路由器链接的 Material-ui 排版始终带下划线
- react-native - 如何在反应原生的功能组件中使用 forwardRef?
- typescript - 如何修复“属性没有初始化程序并且未在构造函数中明确分配”错误?
- substrate - 底物创世区块不匹配
- python - SQLAlchemy:使用 in_ 运算符一次删除多条记录
- python - 从 gensim.models.keyedvectors.Word2VecKeyedVectors 类型的模型传递到 gensim.models.word2vec.Word2Vec 类型的模型
- javascript - 如何让线条在 d3.js 多折线图上呈现?
- python - 如何使用其他列和字符串格式在熊猫数据框中生成列
- sum - 图表。在堆积条形图中显示值的总和
- sql - 如何在 sql 查询的输出上添加过滤器以获得布尔值的答案?