python - 从具有特定主题行的所有 Outlook 电子邮件中提取可变数据,然后从正文中获取日期
问题描述
我每天都会收到一封电子邮件,上面有当天售出的水果数量。虽然我现在已经想出了一些代码来记录相关数据,但我无法向后执行。
数据存储在电子邮件正文中,如下所示:
Date of report:,01-Jan-2020
Apples,8
Pears,5
Lemons,7
Oranges,9
Tomatoes,6
Melons,3
Bananas,0
Grapes,4
Grapefruit,8
Cucumber,2
Satsuma,1
我想要代码做的是首先搜索我的电子邮件并找到与特定主题匹配的电子邮件,逐行迭代并找到我正在搜索的变量,然后将它们记录在带有“报告日期”记录在日期列中并转换为格式:“%m-%d-%Y”。
我想我可以通过对我编写的代码进行一些修改来实现这一点,以处理跟踪它的未来:
# change for the fruit you're looking for
Fruit_1 = "Apples"
Fruit_2 = "Pears"
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
# find data email
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 and Fruit_2 in message.body:
data = str(message.body)
break
else:
print('No data for', Fruit_1, 'or', Fruit_2, 'was found')
break
fruitd = open("fruitd.txt", "w") # copy the contents of the latest email into a .txt file
fruitd.write(data)
fruitd.close()
def get_vals(filename: str, searches: list) -> dict:
#Searches file for search terms and returns the values
dct = {}
with open(filename) as file:
for line in file:
term, *value = line.strip().split(',')
if term in searches:
dct[term] = float(value[0]) # Unpack value
# if terms are not found update the dictionary w remaining and set value to None
if len(dct.keys()) != len(searches):
dct.update({x: None for x in search_terms if x not in dct})
return dct
searchf = [
Fruit_1,
Fruit_2
] # the list of search terms the function searches for
result = get_vals("fruitd.txt", searchf) # search for terms
print(result)
# create new dataframe with the values from the dictionary
d = {**{'date':today}, **result}
fruit_vals = pd.DataFrame([d]).rename(columns=lambda z: z.upper())
fruit_vals['DATE'] = pd.to_datetime(fruit_vals['DATE'], format='%d-%m-%Y')
print(fruit_vals)
我正在创建一个名为“fruitd”的 .txt,因为我不确定如何以其他方式遍历电子邮件正文。不幸的是,我不认为为过去的每封电子邮件创建一个 .txt 真的可行,我想知道是否有更好的方法来做到这一点?
任何建议或指示都将受到欢迎。
**EDIT 理想情况下想获取搜索列表中的所有变量;所以 Fruit_1 & Fruit_2 有空间在必要时将其扩展为 Fruit_3 + Fruit_4 (etc)。
解决方案
#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
SEARCHF = [
Fruit_1,
Fruit_2
]
#DEF THE STUFF
# modified to take a list of list of strs as `report` arg
# apparently IDK how to type-hint; type-hinting removed
def get_report_vals(report, searches):
dct = {}
for line in report:
term, *value = line
# `str.casefold` is similar to `str.lower`, arguably better form
# if there might ever be a possibility of dealing with non-Latin chars
if term.casefold().startswith('date'):
#FIXED (now takes `date` str out of list)
dct['date'] = pd.to_datetime(value[0])
elif term in searches:
dct[term] = float(value[0])
if len(dct.keys()) != len(searches):
# corrected (?) `search_terms` to `searches`
dct.update({x: None for x in searches if x not in dct})
return dct
#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
results = []
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
# are you looking for:
# Fruit_1 /and/ Fruit_2
# or:
# Fruit_1 /or/ Fruit_2
if Fruit_1 in message.body and Fruit_2 in message.body:
# FIXED
data = [line.strip().split(",") for line in message.body.split('\n')]
results.append(get_report_vals(data, SEARCHF))
else:
pass
fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)
推荐阅读
- gcc - 进入 .elf 图像的最少代码量是多少
- php - PHP图片上传出错
- python - Python 测试:Mock、Fake、Double?
- apache-camel - 相当于 NETTY-HTTP 中来自 HTTP 的 copyHeaders
- javascript - 扩展 Phaser3 容器类抛出 TypeError
- python - 如何将 etree 元素(不是 ElementTree)转换为字符串以进行打印?
- python - 出现错误:ValueError:使用 Flask 构建 REST API 时没有足够的值来解包(预期为 2,得到 0)
- c# - Unity根据全局位置向对象添加向下力
- ios - 使用 Codemagic 编译 IOS 失败
- javascript - 反应:元素类型无效:需要一个字符串(对于内置组件)