首页 > 技术文章 > requests, Beautifusoup 爬取新浪新闻资讯

minorblog 2017-10-05 14:09 原文

###1.爬取新浪新闻首页的新闻标题时间和链接

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 res = requests.get('http://news.sina.com.cn/china')
 5 res.encoding = 'utf-8'
 6 soup = BeautifulSoup(res.text, 'html.parser')
 7 
 8 for news in soup.select('.news-item'):
 9     if len(news.select('h2')) > 0:
     
10 h2 = news.select('h2')[0].text 11 time = news.select('.time')[0].text 12 a = news.select('a')[0]['href'] 13 print(time, h2, a)

 

  • 取得新闻内文
res = requests.get('http://news.sina.com.cn/o/2017-09-26/doc-ifymenmt7129299.shtml')
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text, 'html.parser')

       抓取新闻标题

            

推荐阅读