首页 > 解决方案 > bigquery python中的流式插入

问题描述

python SDK的Client.insert_rows记录为

通过流 API 将行插入表中。

请参阅 https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

但是当我尝试将它与需要流式传输的大数据一起使用时,我收到了这个错误:

Traceback (most recent call last):
  File "demo.py", line 15, in <module>
    exit(main())
  File "demo.py", line 12, in main
    client.insert_rows(table, rows)
  File "google/cloud/bigquery/bigquery_future/client.py", line 1213, in insert_rows
    return self.insert_rows_json(table, json_rows, **kwargs)
  File "google/cloud/bigquery/bigquery_future/client.py", line 1293, in insert_rows_json
    data=data)
  File "google/cloud/bigquery/bigquery_future/client.py", line 301, in _call_api
    return call()
  File "google/api_core/retry.py", line 246, in retry_wrapped_func
    on_error=on_error,
  File "google/api_core/retry.py", line 163, in retry_target
    return target()
  File "google/cloud/core_future/_http.py", line 279, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://www.googleapis.com/bigquer
y/v2/projects/myproject/datasets/mydataset/tables/mytable/insertAll:
Request payload size exceeds the limit: 10915700 bytes.

深入研究代码,它肯定会在向文档中提到的 REST API 发送 POST http 请求之前对数据进行两次传递(我已经小心地从生成器中产生了这些数据)。该 API 将单个 JSON 对象指定为主体,这不是可流式传输的格式,而且我根本看不到该端点中的流式传输有任何允许。

我错过了什么?SDK 开发人员对流的定义是否与我的完全不同?流式 API 的大小限制如何?

标签: pythongoogle-bigquery

解决方案


推荐阅读