首页 > 解决方案 > 硒文本选择的异常索引问题

问题描述

Iam selenium 使用此网站练习网页抓取并提取名称 + Twitter @'s 为英国排名前 100 的大学https://www.timeshighereducation.com/news/top-100-most-influential-uk-and- us-universities-on-twitter/2013373.article. 我已经让它按我的意愿工作了,但是,该网站有点不寻常,并且在这些“tr”标签中为每所大学的每一行排序了部分,所以我通过类名来获取它们,但是,这也是命名为怪异,它们一个接一个地命名为“奇数”“偶数”,但我已经使用它并让它工作了。但是我注意到,当我尝试索引 elementsEven 中的第一项时,它给了我“indexofrangeerror”,但是当我执行 print(elementsEven[0]) 时,它向我显示了一个输出。然后我想我必须从索引 3 开始,但是为什么会这样,为什么它不让我从 0 开始,即使它通过打印语句告诉我它在那里有信息?有人知道吗?

import csv
import os
from dotenv import load_dotenv
from time import sleep
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from msedge.selenium_tools import Edge, EdgeOptions # Microsoft Edge 

# Get edge chromium
options = EdgeOptions()
options.use_chromium = True
driver = Edge(executable_path='./edgedriver_win32/msedgedriver.exe', options=options)

driver.get("https://www.timeshighereducation.com/news/top-100-most-influential-uk-and-us-universities-on-twitter/2013373.article")


elementsOdd = driver.find_elements_by_class_name("odd")
elementsEven = driver.find_elements_by_class_name("even")

card = elementsOdd[49] # FROM 0 TO 49

# Gets university name\n@tag, so we need to split by \n
text = card.find_elements_by_xpath("./td[1]/strong")[0].text
split_text = text.split("\n")
university_name, university_twitter_tag = split_text[0], split_text[1]

print(university_name + " " + university_twitter_tag)


card2 = elementsEven[3] # 3 FROM UP TO 51

# Gets university name\n@tag, so we need to split by \n
text2 = card2.find_elements_by_xpath("./td[1]/strong")[0].text

split_text2 = text2.split("\n")
university_name2, university_twitter_tag2 = split_text2[0], split_text2[1]

print(university_name2 + " " + university_twitter_tag2)

标签: pythonseleniumselenium-webdriverweb-scrapingmicrosoft-edge

解决方案


您正在尝试tr使用他们的class-name. 当您尝试使用driver.find_elements_by_class_name("even")时,它会返回任何包含类名的标签 - even

第一个元素是elementsEven[0]标签div

<div class="field-item even">
    <h2 class="standfirst">
        Most universities in the UK and the US now have a presence on Twitter, but which institutions can claim to be using the micro-blogging social network most effectively?
    </h2>
</div>

您可以使用以下行进行检查:

card2 = elementsEven[0] # You can try for other index elements.
print(card2.get_attribute("innerHTML"))
<h2 class="standfirst">Most universities in the UK and the US now have a presence on Twitter, but which institutions can claim to be using the micro-blogging social network most effectively?</h2>

要了解其类名中的元素even,请尝试使用 DOM 中的以下 xpath。它突出了 103 个元素。

//*[contains(@class,'even')]

如果您想使用正确的索引提取详细信息,请使用唯一定位器。

elementsEven = driver.find_elements_by_css_selector("tr[class=even]")
or 
elementsEven = driver.find_elements_by_xpath("//tr[@class='even']")
card2 = elementsEven[0]

推荐阅读