首页 > 解决方案 > 如何使用 python selenium 查找和存储动态加载的元素?

问题描述

我正在尝试使用 Python Selenium从此配置文件的“关注者”按钮列表中抓取用户名。我不能这样做有两个原因:

  1. 我无法通过使用滚动列表,driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")因为列表有 2 个滚动条(我不知道为什么它有 2 个)。如果我尝试滚动它会滚动个人资料页面而不是实际列表。
  2. 即使我设法滚动列表,我应该如何存储用户名?用户是动态加载的,由于某种原因,类 id 看起来像这样class='st--c-PJLV st--c-dhzjXW st--c-edagZx'

我已经尝试了几种解决这个问题的方法,但我无法达到我想要的结果,感谢任何帮助。以下是我尝试使用但出现错误的一些代码片段:

scrollElem = driver.find_elements(By.XPATH, "//div[@class='st--c-PJLV st--c-dhzjXW st--c- 
edagZx']/a")
followernumber = 2000
scrollElem[len(scrollElem)-1].location_once_scrolled_into_view
for i in range(0,followernumber):
    new = len(scrollElem)+i
    newname = driver.find_element(By.XPATH, "(//div[@class='st--c-PJLV st--c-dhzjXWstedagZx']/a)[%i]"%new)
    print(newname.text, i)
    newname.location_once_scrolled_into_view
    time.sleep(1)

得到错误:selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"(//div[@class='st--c-PJLV st--c-dhzjXW st--c-edagZx']/a)[47]"}

我还尝试使用此算法在列表底部滚动并在加载时存储元素,但这也不起作用:

def scrollDown():
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

该算法滚动了个人资料页面而不是关注者列表

我会很感激任何帮助,因为我是网络抓取的新手!

标签: pythonseleniumweb-scraping

解决方案


尝试使用 requests 模块获取该配置文件的所有关注者名称:

import requests

link = 'https://hasura2.foundation.app/v1/graphql'
payload = {"query":"query userFollowersQuery($publicKey: String!, $currentUserPublicKey: String!, $offset: Int!, $limit: Int!) {\n  follows: follow(\n    where: {followedUser: {_eq: $publicKey}, isFollowing: {_eq: true}}\n    offset: $offset\n    limit: $limit\n  ) {\n    id\n    user: userByFollowingUser {\n      name\n      username\n      profileImageUrl\n      userIndex\n      publicKey\n      follows(where: {user: {_eq: $currentUserPublicKey}, isFollowing: {_eq: true}}) {\n        createdAt\n        isFollowing\n      }\n    }\n  }\n}\n","variables":{"currentUserPublicKey":"","publicKey":"0xF74d1224931AFa9cf12D06092c1eb1818D1E255C","offset":0,"limit":48},"operationName":"userFollowersQuery"}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    
    while True:
        resp = s.post(link,json=payload)
        if not resp.json()['data']['follows']:break
        for item in resp.json()['data']['follows']:
            print(item['user']['username'])

        payload['variables']['offset']+=48

推荐阅读