首页 > 解决方案 > 使用 python 将嵌套的 BigQuery 数据导出到云存储

问题描述

尝试将 bigquery 数据导出到存储,但出现错误“无法对嵌套架构执行 400 操作。字段:event_params”。

下面是我的代码:

from google.cloud import bigquery
client = bigquery.Client()
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/Nitin/Desktop/big_query_test/soy-serty-897-ed73.json"
bucket_name = "soy-serty-897.appspot.com"
project = "soy-serty-897"
dataset_id = "analytics_157738"
table_id = "events_20190326"

destination_uri = 'gs://{}/{}'.format(bucket_name, 'basket.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  # Waits for job to complete.

print('Exported {}:{}.{} to {}'.format(
    project, dataset_id, table_id, destination_uri))

标签: pythongoogle-bigquery

解决方案


在 BigQuery导出限制中,提到 CSV 不支持嵌套和重复数据。因此,尝试导出到 Avro 或 JSON:

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'your_bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, '<your_file>')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
configuration = bigquery.job.ExtractJobConfig()
#For AVRO
#configuration.destination_format ='AVRO'
#For JSON
#configuration.destination_format ='NEWLINE_DELIMITED_JSON'

extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=configuration,
location='US')
extract_job.result()

希望能帮助到你。


推荐阅读