首页 > 解决方案 > 如何从 faceit 中抓取链接

问题描述

我正在尝试从 faceit 房间中抓取代码,这是我尝试过的,但它不起作用。任何帮助深表感谢!

import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.faceit.com/en/csgo/room/1-8d6729b5-cfeb-4059-8894-3b07e04e76b2')
soup = BeautifulSoup(r.content, 'html.parser')
extracted_link = soup.find_all('href', class_='list-unstyled')
print(extracted_link)

示例链接:https ://www.faceit.com/en/csgo/room/1-8d6729b5-cfeb-4059-8894-3b07e04e76b2

示例链接提取:https ://demos-europe-west2.faceit-cdn.net/csgo/f9eadb47-aea5-4672-9499-4f457c7d28bd.dem.gz

示例:https ://paste.pics/AQBQY

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


页面的所有内容都是动态加载的,这意味着它BeautifulSoup不会看到它。所以你实际上可能会更好地使用seleniuminwebdriver模式headless

例如:

import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

url = "https://www.faceit.com/en/csgo/room/1-8d6729b5-cfeb-4059-8894-3b07e04e76b2"
driver.get(url)
time.sleep(2)
element = driver.find_element_by_css_selector('.match-vs .btn-default')
print(element.get_attribute("href"))

输出:

https://demos-europe-west2.faceit-cdn.net/csgo/f9eadb47-aea5-4672-9499-4f457c7d28bd.dem.gz

推荐阅读