python - 由换行符分隔的普通 JSON 到 Bigquery JSON 要求
问题描述
我有一个长度超过 100,000+ 的字典列表。
我将如何将其转换为 JSON 并将其写入 JSON 文件,以满足 Bigquery 的要求以创建带有换行符的 JSON 文件。
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
代替
[{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}, {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}]
注意两个 JSON 之间的区别:第一个是换行符分隔,而第二个是逗号分隔(Python 中的普通 JSON 转储)。我需要第一个。
我之前所做的是在循环的最后一部分,我正在这样做:
while condition:
with open('cache/name.json', 'a') as a:
json_data = json.dumps(store)
a.write(json_data + '\n')
这样做,我根据字典列表的长度打开和编写,这使得循环变慢。
我如何能够按照 bigquery 的要求以更快的方式插入它?
解决方案
这种格式称为 NEWLINE_DELIMITED_JSON,bigquery 有内置库来加载它。考虑到您在 gs 存储桶中有 json,您可以使用以下内容:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("post_abbr", "STRING"),
],
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"
load_job = client.load_table_from_uri(
uri,
table_id,
location="US", # Must match the destination dataset location.
job_config=job_config,
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))
推荐阅读
- windows-installer - 如何从其他产品检查安装在 msi 上的功能
- python-3.x - 如何等待双击?
- python - 将 CSV 文件转换为类型化元组
- python - 无法将我的字典转换为数据框
- javascript - 我的两个 js 函数单独工作但不能一起工作
- sql - 如何创建一个透视多列的 PostgreSQL 透视表?
- android - 更改 Firebase 通知标题颜色 - Android 9.0
- python - 在 Python 中猜测游戏循环
- android - 如何修复 Exoplayer 中捕获的视频的镜像效果
- recursion - Jackson:如何通过仅在子集合中添加 ID 字段来防止递归调用