首页 > 解决方案 > 如何将新闻网页抓取提取到 csv 文件中以及如何附加新记录?

问题描述

python 新手,并构建了一个网络爬虫来从 cnn 头条新闻中提取新的新闻文章。试图获取当我 print() 看起来像逐行项目的输出。希望将结果提取到 csv 文件中,以便每个标题都是自己的行。然后还能够编写附加版本,因此每次我运行它时,它都会附加到文件而不是覆盖它。问题是如何让结果在 csv 文件中看起来像这样:

1)来自抓取数据的标题 1 2)来自抓取数据的标题 2 3)来自抓取数据的标题 3,依此类推。

我在下面粘贴了我的代码:

from bs4 import BeautifulSoup
import requests
import csv
#nterwebsite you wish to pull from that has news articles
res = requests.get('http://money.cnn.com/')
soup = BeautifulSoup(res.text, 'lxml')
#need to pul the ulcode from the website by right clicking and choosing inspecting element
news_box = soup.find('ul', {'class': '_6322dd28 ad271c3f'})
#drill down into the li's as they should always show a, which signals the header for the news article shown.
all_news = news_box.find_all('a')

for news in all_news:
  test=  (news.text)
  print(test)
with open('index.csv', 'w') as fobj:
    csvwriter = csv.writer(fobj, delimiter=',')
    for row in test:
        csvwriter.writerow(test)

标签: pythonweb-scrapingexport-to-csv

解决方案


您可以re.compile使用BeautifulSoup.find_all

from bs4 import BeautifulSoup as soup
import requests, re
import csv
d = soup(requests.get('http://money.cnn.com/').text, 'html.parser')
articles = list(filter(None, [i.text for i in d.find_all('span', {'class':re.compile('^\w+ _\w+|^\w+$')})]))[2:]
with open('articles.csv', 'a') as f:
  write = csv.writer(f)
  write.writerows([[i] for i in articles])

输出:

What higher wages means for Domino's and McDonald's  
'Jurassic World' sequel has big opening day amid a surging box office 
Crying migrant girl: What the iconic photo says about press access 
Chanel reveals earnings for the first time in its 108-year history 
Why GE may need to stop paying its 119-year old dividend 
A top Netflix executive is out after using the N-word 
ZTE pays $1 billion fine to US over sanctions violations 
Tariffs on European cars would hurt US auto jobs 
Etsy sellers confront unknowns after Supreme Court ruling 
Chipotle hopes quesadillas and milkshakes bring customers back 
This group is getting ahead in America 
OPEC strikes deal to increase oil production 
Wall Street banks are healthier than ever 
Self-driving Uber driver may have been streaming 'The Voice' 
GM's new Chevy Blazer will be built in Mexico 
"GM is bringing back the Chevy Blazer, an SUV classic "
...

推荐阅读