首页 > 解决方案 > 如何在 python jupyter notebook 中运行 bigquery SQL 查询

问题描述

我尝试在 Jupyter 笔记本中从 Google BigQuery 运行 SQL 查询。我按照这里写的所有内容https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#download_query_results_using_the_client_library。我开设了一个客户账户并下载了 JSON 文件。现在我尝试运行脚本:

from google.cloud import bigquery

bqclient = bigquery.Client('c://folder/client_account.json')

# Download query results.
query_string = """
SELECT * from `project.dataset.table`
"""

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(
        # Optionally, explicitly request to use the BigQuery Storage API. As of
        # google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
        # API is used by default.
        create_bqstorage_client=True,
    )
)
print(dataframe.head())

但我不断收到错误消息:

DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

我不明白我做错了什么,因为 JSON 文件看起来很好并且文件的路径是正确的。

标签: pythongoogle-bigqueryjupyter-notebook

解决方案


该错误表明您的 GCP 环境无法识别和配置所需的应用程序凭据。

要使用服务帐户进行身份验证,请遵循以下方法:

from google.cloud import bigquery
from google.oauth2 import service_account


# TODO(developer): Set key_path to the path to the service account key
#                  file.
key_path = "path/to/service_account.json"

credentials = service_account.Credentials.from_service_account_file(
    key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)

bqclient = bigquery.Client(credentials=credentials, project=credentials.project_id,)

query_string = """
SELECT * from `project.dataset.table`
"""

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(
        # Optionally, explicitly request to use the BigQuery Storage API. As of
        # google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
        # API is used by default.
        create_bqstorage_client=True,
    )
)
print(dataframe.head())

推荐阅读