python-3.x - 如何使用 python 和 boto3 将 xlsx 导入 dynamodb
问题描述
尝试使用 LinuxAcademy 发布的如何将 Excel 数据导入 DynamoDB 但代码发布已有两年之久,无法正常工作。任何提示或建议都会非常有帮助。
对不起,我是stackoverflow的新手。
我试图获取一个 excel 电子表格并将其转换为 json,然后像 LinuxAcademy 上的帖子一样上传到 DynamoDB。说明很旧,他们使用三个脚本上传一个文件。
解决方案
这是我用来创建 lambda AWS python 函数的代码。
唯一的问题是它读取 excel 文件并将其转换为 json,并且文件太大而无法在 5 分钟超时之前摄取到 DynamoDB。我可能会将它转换为步进函数,但这对我有用。
import boto3
import os
import sys
import uuid
import pandas as pd
s3_client = boto3.client('s3')
bucket = "serverless-record-storage-lambda"
def upload_to_dynamodb(report):
df=pd.read_excel(report)
df.columns=["APPLICATION", "FORM_NUMBER", "FILE_DATE", "STATUS_DATE", "STATUS", "STATUS_CODE", "EXPIRATION_DATE", "ESTIMATED COST", "REVISED_COST", "EXISTING_USE", "EXISTING_UNITS", "PROPOSED_USE","PROPOSED_UNITS","PLANSETS", "15_DAY_HOLD?" , "EXISTING_STORIES", "PROPOSED_STORIES", "ASSESSOR_STORIES", "VOLUNTARY", "PAGES", "BLOCK", "LOT", "STREET_NUMBER", "STREET_NUMBER_SFX", "AVS_STREET_NAME", "AVS_STREET_SFX", "UNIT", "UNIT_SFX", "FIRST_NAME", "LAST_NAME", "CONTRACTORPHONE",
"COMPANY_NAME", "STREET_NUMBER", "STREET", "STREET_SUFFIX", "CITY", "STATE", "ZIP_CODE", "CONTACT_NAME", "CONTACT_PHONE", "DESCRIPTION" ]
# Clean-up the data, change column types to strings to be on safer side :)
df=df.replace({'-': '0'}, regex=True)
df=df.fillna(0)
for i in df.columns:
df[i] = df[i].astype(str)
# Convert dataframe to list of dictionaries (JSON) that can be consumed by any no-sql database
myl=df.T.to_dict().values()
# Connect to DynamoDB using boto
resource = boto3.resource('dynamodb', region_name='us-west-2')
# Connect to the DynamoDB table
table = resource.Table('permitdata')
# Load the JSON object created in the step 3 using put_item method
for permit in myl:
table.put_item(Item=permit)
def handler(event, context):
for record in event['Records']:
print(record)
bucket = record['s3']['bucket']['name']
print(bucket)
key = record['s3']['object']['key']
print(key)
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
upload_path = '/tmp/resized-{}'.format(key)
s3_client.download_file(bucket, key, download_path)
upload_to_dynamodb(download_path)
def main():
handler(event, None)
if __name__ == "__main__":
main()
推荐阅读
- mysql - SQL 过程中用于创建用户和授予权限的语法错误
- scala - 根据其他列之间的操作(最小值、最大值、总和)将列添加到数据框
- c++ - 如何在 Google Mock 中匹配 C 样式的数组
- mongodb - “errmsg”:“无法识别的管道阶段名称:'totalAmount'”
- css - CSS 网格不符合
- sql - Oracle ora-00907 missing right parenthesis
- visual-studio - 如何理解VS2017项目依赖中的警告符号是什么意思
- registry - What are essential registry entries for an uninstaller?
- javascript - 在节点js中的if语句中检查条件是真还是假
- php - 通过php代码对sql数据库进行多次搜索