首页 > 解决方案 > 在 Google Collab + Bigquery 中运行查询

问题描述

我已经按照这里的步骤一步一步插入了这个片段:

https://colab.research.google.com/notebook#snippetFileIds=%2Fv2%2Fexternal%2Fnotebooks%2Fsnippets%2Fbigquery.ipynb&snippetQuery=Using%20BigQuery%20with%20Pandas%20API

但是,我可以运行查询,但随后出现错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-22-b9e37aa67e26> in <module>()
      9     COUNT(*) as total
     10   FROM `bigquery-public-data.samples.gsod`
---> 11 ''', project_id=project_id).total[0]
     12 
     13 df = pd.io.gbq.read_gbq(f'''

8 frames
/usr/local/lib/python3.6/dist-packages/pyarrow/table.pxi in pyarrow.lib.RecordBatch.from_arrays()

TypeError: from_arrays() takes at least 2 positional arguments (1 given)

我尝试了几个数据库,但没有成功。

任何想法?

标签: google-bigquery

解决方案


我已按照将 BigQuery 与 Pandas API Colab一起使用的步骤进行操作,它对我来说效果很好。首先,如果您还没有 Cloud Platform 项目,您需要创建一个,然后启用计费和 BigQuery API。

运行第一段代码时,您需要单击控制台中显示的链接,复制验证码并将其粘贴到控制台中的Enter verification code字段:

from google.colab import auth
auth.authenticate_user()

在运行第二段代码之前,您需要将project_id字段名称更改为您在 GCP 中创建的实际项目的名称:

import pandas as pd

# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'your Cloud Platform project ID'
sample_count = 2000

row_count = pd.io.gbq.read_gbq('''
  SELECT 
    COUNT(*) as total
  FROM `bigquery-public-data.samples.gsod`
''', project_id=project_id).total[0]

df = pd.io.gbq.read_gbq(f'''
  SELECT
    *
  FROM
    `bigquery-public-data.samples.gsod`
  WHERE RAND() < {sample_count}/{row_count}
''', project_id=project_id)

print(f'Full dataset has {row_count} rows')

之后,您将获得以下输出:

在此处输入图像描述

我希望它对你有帮助。


推荐阅读