首页 > 解决方案 > 如何使用 python 和 boto3 将 xlsx 导入 dynamodb

问题描述

尝试使用 LinuxAcademy 发布的如何将 Excel 数据导入 DynamoDB 但代码发布已有两年之久,无法正常工作。任何提示或建议都会非常有帮助。

对不起,我是stackoverflow的新手。

我试图获取一个 excel 电子表格并将其转换为 json,然后像 LinuxAcademy 上的帖子一样上传到 DynamoDB。说明很旧,他们使用三个脚本上传一个文件。

标签: python-3.x

解决方案


这是我用来创建 lambda AWS python 函数的代码。

唯一的问题是它读取 excel 文件并将其转换为 json,并且文件太大而无法在 5 分钟超时之前摄取到 DynamoDB。我可能会将它转换为步进函数,但这对我有用。

import boto3
import os
import sys
import uuid
import pandas as pd

s3_client = boto3.client('s3')
bucket = "serverless-record-storage-lambda"


def upload_to_dynamodb(report):
    df=pd.read_excel(report)
    df.columns=["APPLICATION", "FORM_NUMBER", "FILE_DATE", "STATUS_DATE", "STATUS", "STATUS_CODE", "EXPIRATION_DATE", "ESTIMATED COST", "REVISED_COST", "EXISTING_USE", "EXISTING_UNITS", "PROPOSED_USE","PROPOSED_UNITS","PLANSETS", "15_DAY_HOLD?" ,  "EXISTING_STORIES", "PROPOSED_STORIES", "ASSESSOR_STORIES", "VOLUNTARY", "PAGES", "BLOCK", "LOT", "STREET_NUMBER", "STREET_NUMBER_SFX", "AVS_STREET_NAME", "AVS_STREET_SFX", "UNIT", "UNIT_SFX", "FIRST_NAME", "LAST_NAME", "CONTRACTORPHONE",
        "COMPANY_NAME", "STREET_NUMBER", "STREET", "STREET_SUFFIX", "CITY", "STATE", "ZIP_CODE", "CONTACT_NAME", "CONTACT_PHONE", "DESCRIPTION" ]
    # Clean-up the data, change column types to strings to be on safer side :)
    df=df.replace({'-': '0'}, regex=True)
    df=df.fillna(0)
    for i in df.columns:
        df[i] = df[i].astype(str)
    # Convert dataframe to list of dictionaries (JSON) that can be consumed by any no-sql database
    myl=df.T.to_dict().values()
    # Connect to DynamoDB using boto
    resource = boto3.resource('dynamodb', region_name='us-west-2')
    # Connect to the DynamoDB table
    table = resource.Table('permitdata')
    # Load the JSON object created in the step 3 using put_item method
    for permit in myl:
        table.put_item(Item=permit)

def handler(event, context):
    for record in event['Records']:
        print(record)
        bucket = record['s3']['bucket']['name']
        print(bucket)
        key = record['s3']['object']['key']
        print(key)
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        upload_path = '/tmp/resized-{}'.format(key)

        s3_client.download_file(bucket, key, download_path)
        upload_to_dynamodb(download_path)



def main():
    handler(event, None)

if __name__ == "__main__":
    main()

推荐阅读