首页 > 解决方案 > 如何在 Github 存储库上将 python 数据帧作为 csv 文件加载?

问题描述

我需要在服务器上部署一个 Dash 应用程序。对于数据存储库,我使用的是 Github。所有被操纵的数据都需要存储在 Github 上,以便我的 Dash 应用程序可以访问它们。

我遇到的所有解决方案都要求我将数据框保存为本地的 csv,然后将其提交到 Github。在我的情况下,这是不可能的,我需要将数据帧作为 csv 直接提交到 Github。

提前感谢您的帮助。

标签: pythongithubgithub-api

解决方案


诀窍是将您的熊猫数据框转换为文本,然后使用相同的内容上传您的文件。这对https://stackoverflow.com/a/50072113/7375722 很有帮助

我正在分享我目前正在使用的代码 -

#Import required packages
import pandas as pd
from github import Github
from github import InputGitTreeElement
from datetime import datetime

#create test pd df to upload
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
#convert pd.df to text. This avoids writing the file as csv to local and again reading it
df2 = df.to_csv(sep=',', index=False)

#list files to upload and desired file names with which you want to save on GitHub
file_list = [df2,df2]
file_names = ['Test.csv','Test2.csv']

#Specify commit message
commit_message = 'Test Python'

#Create connection with GiHub
user = "{your-user-id}"
password = "{your-password}"
g = Github(user,password)

#Get list of repos
for repo in g.get_user().get_repos():
    print(repo.name)
    repo.edit(has_wiki=False)

#Create connection with desired repo
repo = g.get_user().get_repo('{your-repo-name}')

#Check files under the selected repo
x = repo.get_contents("")
for labels in x:
    print(labels)
x = repo.get_contents("Test.csv") #read a specific file from your repo

#Get available branches in your repo
x = repo.get_git_refs()
for y in x:
    print(y)
# output eg:- GitRef(ref="refs/heads/master")

#Select required branch where you want to upload your file.
master_ref = repo.get_git_ref("heads/master")

#Finally, putting everything in a function to make it re-usable

def updategitfiles(file_names,file_list,userid,pwd,Repo,branch,commit_message =""):
    if commit_message == "":
       commit_message = "Data Updated - "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S')

    g = Github(userid,pwd)
    repo = g.get_user().get_repo(Repo)
    master_ref = repo.get_git_ref("heads/"+branch)
    master_sha = master_ref.object.sha
    base_tree = repo.get_git_tree(master_sha)
    element_list = list()
    for i in range(0,len(file_list)):
        element = InputGitTreeElement(file_names[i], '100644', 'blob', file_list[i])
        element_list.append(element)
    tree = repo.create_git_tree(element_list, base_tree)
    parent = repo.get_git_commit(master_sha)
    commit = repo.create_git_commit(commit_message, tree, [parent])
    master_ref.edit(commit.sha)
    print('Update complete')

updategitfiles(file_names,file_list,user,password,'{your-repo-name}','{your-branch-name}')


推荐阅读