python - 导出到 Excel 时数据覆盖
问题描述
我正在抓取一个网站以根据一些关键字收集十篇最近的文章。一旦我得到我的数据(使用的关键字、文章名称、URL/超链接和发布日期),我想将它们全部写入 xls 文件。到目前为止,它只写入最后一个关键字的结果,而不是所有四个关键字,它只是覆盖电子表格的同一部分。如何显示我的整个列表,而不仅仅是最近的部分?
import requests
from bs4 import BeautifulSoup
import datetime
import xlwt
from xlwt import Formula
today = datetime.date.today().strftime("%Y%m%d")
keywords = ('PNC', 'Huntington', 'KeyCorp', 'Fifth Third')
for keyword in keywords:
keyword.replace("+", " ")
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
def article_fetch(keyword):
url = 'https://www.americanbanker.com/search?query={}'.format(keyword)
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')
mylist = []
cols = "KeyWord", "Article", "URL", "Publication Date"
mylist.append(cols)
for articles in soup.find_all("div", "feed-item"):
article = articles.find("h4").text.strip()
timestamp = articles.find("span", "timestamp").text.strip()
article_url = 'https://{}'.format(articles.find("a")["href"][2:])
link = 'HYPERLINK("{}", "Link" )'.format(article_url)
item = [keyword, article, Formula(link), timestamp]
mylist.append(item)
book = xlwt.Workbook()
sheet = book.add_sheet("Articles")
for i, row in enumerate(mylist):
for j, col in enumerate(row):
sheet.write(i, j, col)
book.save("C:\Python\American Banker\American Banker {}.xls".format(today))
for keyword in keywords:
article_fetch(keyword)
print('Workbook Saved')
我希望看到我的整个列表,其中包含所有四个关键字的结果。但是我只看到最后一个关键字的结果。
解决方案
我已将 Excel 文件生成移至脚本末尾:
import requests
from bs4 import BeautifulSoup
import datetime
import xlwt
from xlwt import Formula
today = datetime.date.today().strftime("%Y%m%d")
keywords = ('PNC', 'Huntington', 'KeyCorp', 'Fifth Third')
for keyword in keywords:
keyword.replace("+", " ")
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
def article_fetch(keyword):
url = 'https://www.americanbanker.com/search?query={}'.format(keyword)
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')
for articles in soup.find_all("div", "feed-item"):
article = articles.find("h4").text.strip()
timestamp = articles.find("span", "timestamp").text.strip()
article_url = 'https://{}'.format(articles.find("a")["href"][2:])
link = 'HYPERLINK("{}", "Link" )'.format(article_url)
item = [keyword, article, Formula(link), timestamp]
mylist.append(item)
mylist = []
cols = "KeyWord", "Article", "URL", "Publication Date"
mylist.append(cols)
for keyword in keywords:
article_fetch(keyword)
book = xlwt.Workbook()
sheet = book.add_sheet('Articles')
for i, row in enumerate(mylist):
for j, col in enumerate(row):
sheet.write(i, j, col)
book.save("American Banker {}.xls".format(today))
print('Workbook Saved')
数据不会再丢失:
推荐阅读
- python - 如何在 django 中验证纯 HTML 表单字段
- python - 满足我的空队列条件时如何停止收听?
- node.js - Node - 使用 NODE_MODULE_VERSION 72 针对不同的 Node.js 版本进行编译
- ios - 如何在 Swift 中添加搜索栏?
- python - Twitter bot 未将推文 ID 保存在文本文件中
- math - 6道选择题中3道选择题不正确的概率是多少?
- ruby - Ruby - 将嵌套哈希转换为 CSV
- java - Java 记录 (JEP359) 作为 Spring 控制器请求和响应 DTO
- java - Micronaut 如何仅使用 @Inject 注解进行注入?
- node.js - Nodejs Websockets 在本地工作,但不在服务器上