首页 > 解决方案 > Selenium not getting different URLs

问题描述

I'm using Selenium in Python to scrape a site that loads Javascript Here's my code: [https://gist.github.com/elliotmartin/f9cb5021655f913f250b08f39a14dc9d][1]

For some reason as I loop over different URLs the get_boards function is returning the exact same results.

For example: This URL:

https://playhearthstone.com/en-us/community/leaderboards/?region=US&leaderboardId=STD&seasonId=73&page=1

Gets:

{'FAST44': '1', 'Mesmile': '2', 'Pizza': '3', 'Stacker': '4', 'Jackpot': '5', 'Gavin': '6', 'Monsanto': '7', 'VictorFalcon': '8', 'Cantelope': '9', 'Rozz': '10', 'molino': '11', 'Eddie': '12', 'SwitchSSB': '13', 'Rey': '14', 'wabeka': '15', 'Enrico': '16', 'TheRabbin': '17', 'Jalexander': '18', 'Itim': '19', 'Jay': '20', 'DuVlad': '21', 'Staz': '22', 'BanditKeith': '23', 'Akatsu': '24', 'Montius': '25'}

And this URL:

https://playhearthstone.com/en-us/community/leaderboards/?region=US&leaderboardId=STD&seasonId=73&page=2

Also gets:

{'FAST44': '1', 'Mesmile': '2', 'Pizza': '3', 'Stacker': '4', 'Jackpot': '5', 'Gavin': '6', 'Monsanto': '7', 'VictorFalcon': '8', 'Cantelope': '9', 'Rozz': '10', 'molino': '11', 'Eddie': '12', 'SwitchSSB': '13', 'Rey': '14', 'wabeka': '15', 'Enrico': '16', 'TheRabbin': '17', 'Jalexander': '18', 'Itim': '19', 'Jay': '20', 'DuVlad': '21', 'Staz': '22', 'BanditKeith': '23', 'Akatsu': '24', 'Montius': '25'}

But none of those values are even present anywhere in the HTML loaded by the Javascript on the second URL.

So selenium must not be reloading the new URL? I'm very new to Selenium, so I think that's where my issue lies.

[

1]: https://gist.github.com/elliotmartin/f9cb5021655f913f250b08f39a14dc9d

标签: python-3.xseleniumweb-scraping

解决方案


Try it like this:

driver.get('https://playhearthstone.com/en-us/api/community/leaderboardsData?region=US&leaderboardId=STD&seasonId=73')

rows = driver.execute_script("""
  return JSON.parse(document.body.innerText).leaderboard.rows.reduce(function(acc, o){
    acc[o.accountid] = o.rank
    return acc
  }, {})
""")

I'm just getting the api data from the endpoint and returning it to python


推荐阅读