首页 > 解决方案 > 在 CloudML 中的 Tensorflow 中读取 pandas pickle 文件

问题描述

我在尝试读取df.to_pickle()存储在 Google Cloud 存储中的 pandas pickle eg 方法时遇到错误。我正在尝试执行以下操作:

path_to_gcs_file = 'gs://xxxxx'
f = file_io.FileIO(path_to_gcs_file, mode='r').read()
train_df = pd.read_pickle(f)
f.close()

我收到以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

或者我试过:

f = BytesIO(file_io.read_file_to_string(path_to_gcs_file, binary_mode=True))
train_df = pd.read_pickle(f)

哪个在本地有效,但在 CloudML 上无效!

f = file_io.read_file_to_string(path_to_gcs_file, binary_mode=True)
train_df = pd.read_pickle(f)

给我一个错误:AttributeError: 'bytes' object has no attribute 'seek'

标签: pythonpandastensorflowgoogle-cloud-ml

解决方案


您应该能够摆脱使用上下文管理器,但我认为您正在使用这种方式拉取证书的末尾,因此您应该通过 api 下载文件

pip install --upgrade google-cloud-storage

然后

# Initialise a client
storage_client = storage.Client("[Your project name here]")
# Create a bucket object for our bucket
bucket = storage_client.get_bucket(bucket_name)
# Create a blob object from the filepath
blob = bucket.blob("folder_one/foldertwo/filename.extension")
# Download the file to a destination
blob.download_to_filename(path_to_gcs_file)
with open(path_to_gcs_file, "rb" as f:
    train_df = = pickle.load(f)

从这个答案中得到了很多: 从文件夹内的谷歌云存储下载文件


推荐阅读