首页 > 解决方案 > 如何在 Python 3 上将数据帧上传到 Google Cloud Storage(bucket)?

问题描述

我想创建一个云函数(应在每天 01:00 执行)。该功能应

  1. 生成数据框
  2. [导出为 dataframe.csv] <---- 不确定是否需要
  3. 将数据帧(或 .csv)推送到存储桶

......

现在更新代码:(仍然给出错误)

def push_cars( data ):    ##  <<----- not sure how many paramter &why??

    import requests
    import pandas as pd
    import os
    from datetime import datetime

    from google.cloud.storage.blob import Blob
    from google.cloud import storage
    #import csv               # <<--- not sure if required???


    cars_dict = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

    cars = pd.DataFrame(cars_dict, columns = ['Brand', 'Price'])

    timestamp = datetime.now().strftime("%Y_%m_%d-%H_%M_%S")
    name = "cars_" + timestamp + ".csv"

    cars.to_csv(  "/tmp/test.csv" ,index=False)
    with open('/tmp/test.csv', "w") as csv: 
      csv.write(name) 

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "My-project.json"

    target_bucket = 'cars:python_gogo'


    storage_client = storage.Client()
    bucket         = storage_client.get_bucket(  target_bucket )
    data           = bucket        .blob(        name_output   )


对于云上的复制,您需要创建一个包含以下内容的 requirements.txt:

requests
pandas
google-cloud-storage
datetime

在云外壳中,我使用以下方法部署此 CF: gcloud functions deploy push_cars--entry-point=push_cars--runtime=python37 --memory=1024MB --region=us-east1 --allow-unauthenticated --trigger -http

标签: pythongoogle-cloud-functionsgoogle-cloud-storage

解决方案


问题一:

数据帧不能直接写入云端存储,它需要是一个文件(可以是你提到的.csv),然后你可以将文件写入谷歌云端存储桶。这意味着需要第 2 步。

问题2:

dataframe.csv保存后,您/tmp可以将其传输到 Google Cloud Storage buket。

实现这两件事的代码将是这样的:

def push_cars( data, context ):

    import requests
    import pandas as pd
    import os
    from datetime import datetime

    from google.cloud.storage.blob import Blob
    from google.cloud import storage


    cars_dict = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

    cars = pd.DataFrame(cars_dict, columns = ['Brand', 'Price'])

    timestamp = datetime.now().strftime("%Y_%m_%d-%H_%M_%S")
    name = "cars_" + timestamp + ".csv"

    cars.to_csv(  cars ,index=False)
    with open('/tmp/test.csv', "w") as csv: 
      csv.write(cars) 

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "My-project.json"

    target_bucket = 'sp500_python_gogo'

    storage_client = storage.Client()
    bucket         = storage_client.get_bucket(  target_bucket )
    with open('/tmp/test.csv', 'r') as file_obj:
      upload_blob(target_bucket, file_obj, name)


推荐阅读