python - 如何从 Google Drive 下载多个文本文件并附加到 Pandas 数据框?
问题描述
我正在尝试将 4 个文本文件从 Google 驱动器下载到单个 Pandas 数据框中进行分析。这是我的代码:
# Import Pandas and other stuff
import pandas as pd
import numpy as np
import datetime as dt
from matplotlib import pyplot as plt
# Setup Google Drive access - code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Import weather data from Google Drive
dataFiles = [['https://drive.google.com/open?id=1w3CRxNbIYDXhEgkqwn8BB78C9O2WWLKi','Environmental_Data_Deep_Moor_2012.txt'],
['https://drive.google.com/open?id=1_aHbOnVIOHWUMjIKY9cL3w-0qbwqtZRE','Environmental_Data_Deep_Moor_2013.txt'],
['https://drive.google.com/open?id=1cQOB_jdOEgOtjq1qllBsagGRSzKW_Nii','Environmental_Data_Deep_Moor_2014.txt'],
['https://drive.google.com/open?id=17f-0D0y_n4PpAu_M674amFYL9AnExLod','Environmental_Data_Deep_Moor_2015.txt']]
# Create empty array for file ID numbers
fileIDs =[]
# Split up the file URL to fetch the file ID and download into dataframes
for i in range(0,len(dataFiles)):
fluff, id = dataFiles[i][0].split('=')
fileIDs.append(id)
# If this is the first file being loaded, create a new dataframe, otherwise append:
downloaded[i] = drive.CreateFile({'id':id})
downloaded[i].GetContentFile(dataFiles[i][1])
df_append = pd.read_csv(dataFiles[i][1], sep="\t")
df_weather.append(df_append)
df_append.head()
print("File ID: {} loaded. There are {} total lines loaded into the df_weather data frame.".format(fileIDs[i],len(df_weather)))
似乎只有第一个文件被加载到数据框中。为什么没有加载后续文件的任何想法?
解决方案
找到了问题...我需要将 df_append 数据框分配回 df_weather 数据框。这是我的代码:
# Create empty array for file ID numbers and and empty data frame for the
# weather data with the df_weather data frame
fileIDs =[]
df_weather = pd.DataFrame()
# Split up the file URL to fetch the file ID and download into dataframes
for i in range(0,len(dataFiles)):
fluff, id = dataFiles[i][0].split('=')
fileIDs.append(id)
# If this is the first file being loaded, create a new dataframe, otherwise append:
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile(dataFiles[i][1])
df_append = pd.read_csv(dataFiles[i][1], sep="\t")
df_weather = df_weather.append(df_append)
print("File ID: {} loaded. There are {} total lines loaded into the df_weather data frame.".format(fileIDs[i],len(df_weather)))
推荐阅读
- regex - 无特定顺序的字符和数字的正则表达式(但最多 10 个数字)
- c++ - SDL2 透明面
- django - Djangae Gauth 集成
- elasticsearch - 在 ElasticSearch 中对 top_hits 聚合进行分页
- c++ - 如何正确使用该函数的 if 语句?
- wpf - WPF DataGrid 左上角按钮设置/查看内容?
- javascript - 我是否应该关注“SecurityError: Permission denied to access property on cross-origin object”日志
- flutter - Flutter PopupMenuButton onLongPressed
- python - 如何对彼此靠近的数据点进行聚类并为每个聚类分配一个新的数值?
- excel - 根据颜色过滤后删除突出显示的空白行