首页 > 解决方案 > Python WebScraping with Selenium&gChrome

问题描述

我正在尝试对网页进行网页抓取,但通过类名查找元素不起作用。我可以在 Chrome 的 Elements 面板中看到元素的类名,当输入它时,如下所示,它返回一个空结果。

from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")
usernames = driver.find_elements_by_class_name("md-cell leaderboard-row")
usernames

我正在尝试使用此排行榜页面来至少抓取用户名和他们的积分,进一步的计划是记录他们的位置并将其输入到 Excel 电子表格中,但那是在未来,而不是我在此时此刻。

我从运行“用户名”中看到的输出是“[]”,我知道这意味着它是空的,但我不明白为什么如果我能看到元素和它的类名并且它完全相同。一定是遗漏了什么,或者有什么我不知道的。

标签: pythonseleniumselenium-webdriverweb-scrapingselenium-chromedriver

解决方案


编辑:到底部查看获取数据的更好方法,在这种情况下不必从 html 中抓取

得到它的工作!只需要等待 10 秒,然后只搜索一个类名:

import time
from selenium import webdriver


chrome_path = r"C:\webdrivers\chromedriver.exe" # or wherever you have your chrome webdriver installed
driver = webdriver.Chrome(chrome_path)
driver.get("https://streamelements.com/logna/leaderboard")

# let the page load
time.sleep(10)

# list comprehension to return text of each element with class leaderboard-row
usernames = [element.text for element in
             driver.find_elements_by_class_name("leaderboard-row")
             if element.text != '']

print(usernames)

输出:

['underholderen', '42051', 'jimbyj', '39220', 'delynne', '35411', 'rawrnerunya', '30350', 'simmer5k', '25470', 'bloomspeed', '23885', 'jaidav2000', '22386', 'moobot', '18910', 'virgoproz', '18120', 'ottermandela', '18108', 'v_and_k', '17945', 'kalibxi', '17610', 'commanderroot', '17585', 'jujusan', '17575', 'mellowj', '15390', 'itsvodoo', '15080', 'lord_hal', '14945', 'darkk0ala', '14757', 'sirenmatty', '13230', 'myles_27', '12725', 'upsetpoptart', '12204', 'salsichasensuaal', '11535', 'artalartistic', '11519', 'shannonmcbe', '10895', 'winsock', '10850']

如果您想从表中的其他列中获取数据,这也是可能的

编辑:

更好的是,我能够获得 XHR Web 请求以返回顶级观众列表(这是表中数据的来源,并且是 json 格式): https ://api.streamelements.com/kappa/v2 /points/5cf5740dc3334beee6ba64a6/顶部

您可以查询它并更快地获取数据而无需抓取,让我知道,我可以展示如何。

编辑:

好的,超级简单,WAAAAAAY 更好:

首次安装要求:

pip install requests

然后:

import json
import requests

url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'

# get a dictionary of the request's json response
usernames = requests.get(url).json()
print(usernames)

输出:

{'_total': 19350, 'users': [{'username': 'underholderen', 'points': 42051}, {'username': 'jimbyj', 'points': 39220}, {'username': 'delynne', 'points': 35411}, {'username': 'rawrnerunya', 'points': 30350}, {'username': 'simmer5k', 'points': 25470}, {'username': 'bloomspeed', 'points': 23885}, {'username': 'jaidav2000', 'points': 22386}, {'username': 'moobot', 'points': 18910}, {'username': 'virgoproz', 'points': 18120}, {'username': 'ottermandela', 'points': 18108}, {'username': 'v_and_k', 'points': 17945}, {'username': 'kalibxi', 'points': 17610}, {'username': 'commanderroot', 'points': 17585}, {'username': 'jujusan', 'points': 17575}, {'username': 'mellowj', 'points': 15390}, {'username': 'itsvodoo', 'points': 15080}, {'username': 'lord_hal', 'points': 14945}, {'username': 'darkk0ala', 'points': 14757}, {'username': 'sirenmatty', 'points': 13230}, {'username': 'myles_27', 'points': 12725}, {'username': 'upsetpoptart', 'points': 12204}, {'username': 'salsichasensuaal', 'points': 11535}, {'username': 'artalartistic', 'points': 11519}, {'username': 'shannonmcbe', 'points': 10895}, {'username': 'winsock', 'points': 10850}, {'username': 'macklelotsmore', 'points': 10688}, {'username': 'kikyobooty', 'points': 10650}, {'username': 'jovikingdomkey', 'points': 10385}, {'username': 'dancerhands', 'points': 10186}, {'username': 'mapplerug45', 'points': 10185}, {'username': 'lurxx', 'points': 10175}, {'username': 'jellycat101', 'points': 9965}, {'username': 'dean_', 'points': 9880}, {'username': 'tagou_', 'points': 9550}, {'username': 'arthiphix', 'points': 9505}, {'username': 'beingred', 'points': 9307}, {'username': 'theemrmark', 'points': 9135}, {'username': 'tiptactoe', 'points': 8710}, {'username': 'aten', 'points': 8660}, {'username': 'sweegol', 'points': 8630}, {'username': 'taramichellee', 'points': 8625}, {'username': 'sindar44', 'points': 8590}, {'username': 'nitestalkrr', 'points': 8570}, {'username': 'swoapy', 'points': 8546}, {'username': 'logviewer', 'points': 8380}, {'username': 'umental', 'points': 8235}, {'username': 'chesterfield250', 'points': 8171}, {'username': 'theedgecution', 'points': 8152}, {'username': 'dreameater_gd', 'points': 8110}, {'username': 'camirios29', 'points': 7960}, {'username': 'dirty_soul', 'points': 7895}, {'username': 'princesschango', 'points': 7780}, {'username': 'tylerhunsicker', 'points': 7729}, {'username': 'toonybit', 'points': 7655}, {'username': 'angeloflight', 'points': 7515}, {'username': 'fentondy', 'points': 7325}, {'username': 'owgrandma', 'points': 7165}, {'username': 'ohitspb', 'points': 7150}, {'username': 'jayy557', 'points': 7140}, {'username': 'nightbot', 'points': 7125}, {'username': 'therealjt', 'points': 7110}, {'username': 'hawqks', 'points': 6970}, {'username': 'oxsaucy', 'points': 6930}, {'username': 'somoonm', 'points': 6910}, {'username': 'skiesti', 'points': 6890}, {'username': 'adeeduhs', 'points': 6695}, {'username': 'elmolovesdorothy', 'points': 6660}, {'username': 'liquigels', 'points': 6640}, {'username': 'shadowed21', 'points': 6630}, {'username': 'fakerwtd', 'points': 6450}, {'username': 'fragglefusion', 'points': 6440}, {'username': 'kickypip', 'points': 6230}, {'username': 'cerem5', 'points': 6230}, {'username': 'nikkigsus', 'points': 6225}, {'username': 'bigj808', 'points': 6135}, {'username': 'anotherttvviewer', 'points': 6070}, {'username': 'taratv', 'points': 6040}, {'username': 'l0nnix', 'points': 5970}, {'username': 'sainttt', 'points': 5965}, {'username': 'princejay__', 'points': 5905}, {'username': 'oniisammma', 'points': 5886}, {'username': 'marshallpawpatrol', 'points': 5839}, {'username': 'rosayallday', 'points': 5720}, {'username': 'garvsehgal98', 'points': 5700}, {'username': 'beethoven6', 'points': 5695}, {'username': 'nynxii', 'points': 5680}, {'username': 'tilly', 'points': 5672}, {'username': 'godgundam1019', 'points': 5615}, {'username': 'monoclekitteh', 'points': 5605}, {'username': 'steviewondaaa', 'points': 5580}, {'username': 'ianonymoose', 'points': 5545}, {'username': 'aris1535', 'points': 5477}, {'username': 'rimastino', 'points': 5445}, {'username': 'kodexow', 'points': 5395}, {'username': 'ssondara', 'points': 5360}, {'username': 'cyroku', 'points': 5325}, {'username': 'ankoubzh', 'points': 5250}, {'username': 'sajan_ow', 'points': 5205}, {'username': 'plucik7', 'points': 5125}, {'username': 'sutetchi_', 'points': 5108}]}

编辑(再次):

以下是如何在 excel 中获取它(代码与上面略有不同):

首先安装openpyxl:

pip install openpyxl

然后运行脚本:

import json
import requests
import openpyxl as xl


url = 'https://api.streamelements.com/kappa/v2/points/5cf5740dc3334beee6ba64a6/top'

# get a dictionary of the request's json response
response = requests.get(url).json()

# get just the user list
users = response['users']

# add the index + 1 as rank (because index starts at 0)
for user in users:
    user['rank'] = users.index(user) + 1

# create the workbook
wb = xl.Workbook()

# go to the active sheet
ws = wb.active

# write the header row
ws.append(list(users[0].keys()))

# write the values for each row
for user in users:
    ws.append(list(user.values()))

# save the workbook
wb.save('./streamelements-kappa.xlsx')

推荐阅读