首页 > 解决方案 > 如何运行 python 脚本来生成/更新 S3 存储桶中的 JSON 文件?

问题描述

我的问题:如何运行 python 脚本来生成/更新 S3 存储桶中的 JSON 文件?另外,我是否应该在每次将文件添加到我的存储桶或有人访问我的网页时运行此脚本?

请继续阅读以获取说明...

我有一个托管在 AWS 上的网站,我的静态页面位于公共 S3 存储桶中。目的是与音乐学生分享我写的音乐。我有一个 Python 脚本,它扫描同一个 S3 存储桶中文件夹中的所有对象(乐谱 PDF)。然后,Python 脚本会创建一个 JSON 文件,其中包含所有 s3 对象的名称以及为每个对象生成的 URL

下面是 JSON 文件的格式:

{
  "bass": [{
      "bass-song1": "http://www.website.com/bass-song1.pdf"
    }, {
      "bass-song2": "http://www.website.com/bass-song2.pdf"
    }],
  "drum": [{
      "drum-song1": "http://www.website.com/drum-song1.pdf"
    }, {
      "drum-song2": "http://www.website.com/drum-song2.pdf"
    }],
  "guitar": [{
      "guitar-song1": "http://www.website.com/guitar-song1.pdf"
    }, {
      "guitar-song2": "http://www.website.com/guitar-song2.pdf"
    }]
}

我的工作 Python 脚本,供参考:

import boto3
import json
import pprint

# This program creates a json file
# with temporary URLs for bucket objects, organized by folder
# For use with javascript that generates links to bucket objects for website

# create a session to retrieve credentials from ~/.aws/credentials
session = boto3.Session(profile_name='<MY PROFILE>')

# use your credentials to create a low-level client with the s3 service
s3 = session.client('s3')

# Store dictionary of objects from bucket that starts with the prefix '__'
response = s3.list_objects_v2(Bucket='<MY BUCKET>', Prefix='<FOLDER IN BUCKET>')

folder_list = []
url_json = {}

# (the value of 'Contents' is a list of dictionaries)
# For all the dictionaries in the 'Contents' list,
# IF they don't end with '/' (meaning, if its not a directory)...
for i in response['Contents']:
    if i['Key'].endswith('/') != True:

        # get the directory of the current Key, save as string in 'dir'
        full_path = i['Key']
        # this retrieves the folder after '__'
        dir = full_path.split("/")[1]
        # capitalize the directory name, update variable
        dir = dir.capitalize()
        # this retrieves the file name
        filename = full_path.split("/")[2]

        # if the name of the directory ('dir') is not in folder_list:
        # add it to folder_list,
        # and add an item to dictionary 'url_json' where key is current 'dir' and value is empty list
        if dir not in folder_list:
            folder_list.append(dir)
            url_json[folder_list[-1]] = []

        # generate a temporary URL for the current bucket object
        url = s3.generate_presigned_url('get_object', Params={'Bucket':'<MY BUCKET>', 'Key':i['Key']}, ExpiresIn=3600)

        # create a dictionary for each bucket object
        # store the object's name (Key) and URL (value)
        object_dict = {filename:url}

        # Append the newly created URL to a list in the 'url_json' dictionary,
        # whose key is the last directory in 'folder_list'
        url_json[folder_list[-1]].append(object_dict)

# Dump content of 'url_json' directory to 'urls.json' file
# if it already exists, overwrite it
with open('url_list.json', mode='w') as outfile:
    json.dump(url_json, outfile)

此外,在我的网页正文中,我编写了一些内嵌 JavaScript 来加载/读取 JSON 文件,并使用 URL 及其相关文本创建链接,供人们访问我网页上的这些文件。

我的工作 JavaScript,供参考:

<script type="text/javascript">

async function getData(url) {
  const response = await fetch(url);
  return response.json()
}

async function main() {
  const data = await getData('<JSON FILE URL>');

  // 'instrument' example: 'Bass'
  for (var instrument in data) {

    var h = document.createElement("H3"); // Create the H1 element
    var t = document.createTextNode(instrument); // Create a text element
    h.appendChild(t); // Append the text node to the H1 element
    document.body.appendChild(h); // Append the H1 element to the document body

    // store the list of bass songs
    var song_list = data[instrument]
    // for each song (list element),
    for (var song_object of song_list) {
      // for every song name in the object (in python, dictionary)
      for (var song_name in song_object) {
        // create a var with the name and URL of the PDF song file
        var link_str = song_name;
        var link_url = song_object[song_name];

        // Create link to appear on website
        const a = document.createElement("a");
        var lineBreak = document.createElement("br");
        a.href = link_url;
        a.innerText = link_str;
        a.style.backgroundColor="rgba(255, 255, 255, 0.7)"
        document.body.appendChild(a);
        document.body.append(lineBreak);
      }
    }
  }
}
main();

</script>

所以我的最终目标是: 我将一个 PDF 文件上传到我的 S3 存储桶,我的 Python 脚本执行以更新我的 S3 存储桶中的 JSON 文件,当有人访问我的网页时,JavaScript 将解析 JSON 文件以创建指向这些 PDF 的链接文件。当我将文件上传到存储桶或有人访问网站时,我应该让 python 脚本创建 JSON 吗?问题是这些对象 URL 有一个时间到期......但我也不希望每次有人加载页面时都运行代码(出于 $$ 原因)。

任何帮助将不胜感激。如果您需要我的更多信息,请告诉我。谢谢你,塞巴斯蒂安。

标签: javascriptpythonhtmljsonamazon-s3

解决方案


推荐阅读