python - 字典中的某些字段未写入 csv 文件
问题描述
我已经从duckduckgo.com 抓取了结果并将结果存储在标题、链接、描述链接和描述被打印但标题没有被打印
我已经用 print(title) 打印了标题,它给出了输出
class DuckduckgoScraper(web_scraping):
def scrape(self,search_Term):
self.filename = search_Term
self.url = 'https://duckduckgo.com/html?q='+search_Term
r = requests.get(self.url,headers=USER_AGENT)
soup = BeautifulSoup(r.content,'html5lib')
result_block = soup.find_all(class_ = 'result__body')
for result in result_block:
link = result.find('a', attrs={'class':'result__a'}, href=True)
title = result.find('h2')
description = result.find(attrs={'class':'result__snippet'})
if link and title:
link = link['href']
title = title.get_text()
if description:
description = description.get_text()
with open(self.filename+'.csv', 'a', encoding='utf-8',newline='') as csv_file:
file_is_empty = os.stat(self.filename+'.csv').st_size==0
fieldname = ['title','link','description']
writer = csv.DictWriter(csv_file,fieldnames=fieldname)
if file_is_empty:
writer.writeheader()
writer.writerow({'title':title,'link':link,'description':description})
它没有给出任何错误
解决方案
您在每行迭代中打开 ng 并将其写入 csv 文件。.writerows()
取而代之的是,将行存储在列表中,并在最后用函数一次写入它们。
注意:.strip()
对行的每个项目执行此操作很有用,否则 Excel/LibreOffice/...在打开文件时可能会感到困惑。
import os
import csv
import requests
from bs4 import BeautifulSoup
USER_AGENT = {'User-Agent':'Mozilla/5.0'}
def scrape(search_Term):
filename = search_Term
url = 'https://duckduckgo.com/html?q='+search_Term
r = requests.get(url,headers=USER_AGENT)
soup = BeautifulSoup(r.content,'html5lib')
result_block = soup.find_all(class_ = 'result__body')
for result in result_block:
link = result.find('a', attrs={'class':'result__a'}, href=True)
title = result.find('h2')
description = result.find(attrs={'class':'result__snippet'})
rows = []
if link and title:
link = link['href']
title = title.get_text()
if description:
description = description.get_text()
rows.append({'title':title.strip(), 'link':link.strip(), 'description':description.strip()})
# print(title.strip(), link.strip())
# print(description.strip())
# print('*'* 80)
with open(filename+'.csv', 'a', encoding='utf-8',newline='') as csv_file:
file_is_empty = os.stat(filename+'.csv').st_size==0
fieldname = ['title','link','description']
writer = csv.DictWriter(csv_file,fieldnames=fieldname)
if file_is_empty:
writer.writeheader()
writer.writerows(rows)
scrape('tree')
这会创建tree.csv
. 在 LibreOffice 中,它看起来像这样:
推荐阅读
- php - 一定数量的迭代后循环中断
- c# - .Net Core 框架 ASP.Net Web 应用程序的电子邮件发件人命名空间参考
- php - 如何在旧版 PHP 上启用 mysql?
- servicestack - ServiceStack:由于未知原因不断收到 HTTP 500 错误
- angular - 避免在绑定中调用函数
- html - 根据 rtl 或 ltr 语言自动设置方向和文本对齐
- java - 在 Spring Boot 后端被定向到 404 页面
- amazon-web-services - OSError:libmediainfo.so.0:无法打开共享对象文件:没有来自 aws lambda 的此类文件或目录
- ios - 使用 Facebook 测试深度链接通知在 iOS 中不起作用
- scala - 如何从 Scala 中的向量创建 HashMap