首页 > 解决方案 > 组合两个块并在两者上循环

问题描述

您好,对于模糊的主题帖子感到抱歉,但我正在练习用硒进行网络抓取。我有一个链接列表“urls_to_scrape”,对于每个 url,我想访问链接并提取某些元素,我已经能够提取每个元素,但现在我很困惑如何为列表中的每个 url 执行此操作。请参阅下面的代码。

urls_to_scrape # list containing urls I want to perform the code below for 
               # each url


results = []

articles = driver.find_elements_by_css_selector('#MainW article')

counter = 1

for article in articles:
  result = {}
  try:
     title = article.find_element_by_css_selector('a').text
  except: 
     continue

 counter = counter + 1

 excerpt = article.find_element_by_css_selector('div > div > p').text

 author = 
 article.find_element_by_css_selector('div > footer > address > a').text

 date = article.find_element_by_css_selector('div > footer > time').text

 link=
 article.find_element_by_css_selector('div>h2>a').get_attribute('href')

 result['title'] = title
 result['excerpt'] = excerpt
 result['author'] = author
 result['date'] = date
 result['link'] = link

 results.append(result)

标签: pythonselenium

解决方案


我认为你有一个缩进问题。尝试这个:

urls_to_scrape # list containing urls I want to perform the code below for 
               # each url


results = []

articles = driver.find_elements_by_css_selector('#MainW article')

counter = 1

for article in articles:
    result = {}
    try:
        title = article.find_element_by_css_selector('a').text
    except: 
        continue

    counter = counter + 1

    excerpt = article.find_element_by_css_selector('div > div > p').text

    author = article.find_element_by_css_selector('div > footer > address > a').text

    date = article.find_element_by_css_selector('div > footer > time').text

    link = article.find_element_by_css_selector('div>h2>a').get_attribute('href')

    result['title'] = title
    result['excerpt'] = excerpt
    result['author'] = author
    result['date'] = date
    result['link'] = link
    results.append(result)

driver顺便说一句?您尚未提供获取 URL 的行。这一行对于获取多个 URL 也很重要。


推荐阅读