python - 组合两个块并在两者上循环
问题描述
您好,对于模糊的主题帖子感到抱歉,但我正在练习用硒进行网络抓取。我有一个链接列表“urls_to_scrape”,对于每个 url,我想访问链接并提取某些元素,我已经能够提取每个元素,但现在我很困惑如何为列表中的每个 url 执行此操作。请参阅下面的代码。
urls_to_scrape # list containing urls I want to perform the code below for
# each url
results = []
articles = driver.find_elements_by_css_selector('#MainW article')
counter = 1
for article in articles:
result = {}
try:
title = article.find_element_by_css_selector('a').text
except:
continue
counter = counter + 1
excerpt = article.find_element_by_css_selector('div > div > p').text
author =
article.find_element_by_css_selector('div > footer > address > a').text
date = article.find_element_by_css_selector('div > footer > time').text
link=
article.find_element_by_css_selector('div>h2>a').get_attribute('href')
result['title'] = title
result['excerpt'] = excerpt
result['author'] = author
result['date'] = date
result['link'] = link
results.append(result)
解决方案
我认为你有一个缩进问题。尝试这个:
urls_to_scrape # list containing urls I want to perform the code below for
# each url
results = []
articles = driver.find_elements_by_css_selector('#MainW article')
counter = 1
for article in articles:
result = {}
try:
title = article.find_element_by_css_selector('a').text
except:
continue
counter = counter + 1
excerpt = article.find_element_by_css_selector('div > div > p').text
author = article.find_element_by_css_selector('div > footer > address > a').text
date = article.find_element_by_css_selector('div > footer > time').text
link = article.find_element_by_css_selector('div>h2>a').get_attribute('href')
result['title'] = title
result['excerpt'] = excerpt
result['author'] = author
result['date'] = date
result['link'] = link
results.append(result)
driver
顺便说一句?您尚未提供获取 URL 的行。这一行对于获取多个 URL 也很重要。