首页 > 解决方案 > 提取邮件内容到txt文件

问题描述

class Hotmail:
    def __init__(self, username, password):
        self.browser = webdriver.Chrome()
        self.username = username
        self.password = password
        self.emailsender = []
        self.emailcontent = []

    def signIn(self):
        self.browser.get("url")
        time.sleep(2)
        self.browser.maximize_window()
        username = self.browser.find_element_by_name("xxx").send_keys(self.username)
        self.browser.find_element_by_xpath("xpath").click()
        time.sleep(3)
        password = self.browser.find_element_by_name("name").send_keys(self.password)
        self.browser.find_element_by_xpath("xpath").click()        
        time.sleep(3)
        self.browser.find_element_by_name("name").click()
        self.browser.find_element_by_id("id").click()

    def getEmails(self):
        self.browser.get("anotherurl")
        time.sleep(2)

        sender = self.browser.find_elements_by_css_selector(".somecode")
        content = self.browser.find_elements_by_css_selector(".someother code")
        for i in sender:
            self.emailsender.append(i.text)
        for a in content:
            self.emailcontent.append(a.text)


hotmail = Hotmail(username, password)
hotmail.signIn()
hotmail.getEmails()
print(hotmail.emailsender)
print(hotmail.emailcontent)
# it is all ok until here, problem is below

for a,b in zip(hotmail.emailsender, hotmail.emailcontent):
    # print(a,b) this way you can see all the results
    with open("output.txt", "w") as output:
        output.write(f"{a}-----------------{b}") 

# I get only first email as result, I want all of them

正如您在上面看到的,我有一个代码可以提取我的电子邮件发件人姓名和主题,然后将它们保存到与“sender--------subject”相同的路径上的“output.txt”,但我只收到第一封电子邮件,其余的不能插入,有人知道怎么做吗?

编辑:如果要将其附加到 docx 文件:

import docx
document = Document()
for a,b in zip(hotmail.emailsender, hotmail.emailcontent):
    document.add_paragraph(f"{a}---------------{b} \n")
document.save("yourdocfile.docx")

标签: pythonweb-scrapingwhile-loopextracttxt

解决方案


您不断地一遍又一遍地覆盖同一个文件。一切都可能被写入,但只有最后写入的一个会保留。

您需要以附加模式打开文件(注意"a"):

for a,b in zip(hotmail.emailsender, hotmail.emailcontent):
    with open("output.txt", "a") as output:
        output.write(f"{a}-----------------{b}")

或在整个循环中保持相同的文件打开:

with open("output.txt", "w") as output:
    for a,b in zip(hotmail.emailsender, hotmail.emailcontent):
        output.write(f"{a}-----------------{b}") 

请注意,对于第一种方式,如果您多次运行此代码,它将保留所有运行的所有结果。第二种方法将在每次运行程序时重置文件。


推荐阅读