python - Webdriver 没有获得® Selenium 驱动程序的属性
问题描述
嗨,我想在我向您展示的文件中获取 ® 符号以在 python 中被识别
.get_attribute("href")
但我尝试了很多方法来解码和编码它,但它似乎不起作用。我怀疑它是由 UTF-8 编码制作的,但无法解码。问题是代码看起来一切正常,我已经检查了代码行的所有内容。
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import csv
import time
url = 'https://shopee.com.my/search?keyword=mattress'
driver = webdriver.Chrome(executable_path=r'E:/users/Francabicon/Desktop/Bots/others/chromedriver.exe')
driver.get(url)
time.sleep(0.8)
# select language
driver.find_element_by_xpath('//div[@class="language-selection__list"]/button').click()
time.sleep(3)
# scroll few times to load all items
def clickpy():
for x in range(10):
driver.execute_script("window.scrollBy(0,300)")
time.sleep(0.1)
# get all links (without clicking)
all_items = driver.find_elements_by_xpath('//a[@data-sqe="link"]')
all_urls = []
s=["-Dr.Alstone-","-Dr.-Alstone-","-Lutfy-Paris-"]
for item in all_items:
# This give you whole url of the anchor tag
url = item.get_attribute('href')
if "-Dr.Alstone-" in url:
continue
else:
if "-Dr.-Alstone-" in url:
continue
else:
if "/Dr.Alstone-" in url:
continue
else:
if "-Simoni-" in url:
continue
else:
if "-Lütfy-" in url:
continue
else:
# You need to remove the preceding values in order to verify href later for clicking
urlfinal=url.split('https://shopee.com.my')[1]
all_urls.append(c)
print(all_urls)
a= len(all_urls)
print('len:' + str(a))
# now use links
i = 0
j= a-5
while i <= 4 :
#Identify the parent tag by child tag use following Xpath.
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='col-xs-2-4 shopee-search-item-result__item' and @data-sqe='item'][.//a[@data-sqe='link' and @href='" + all_urls[i] +"']]"))).click()
time.sleep(0.8)
driver.back()
except:
print(all_urls[i] + "doesn't work")
continue
i+=1
while j <= a :
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='col-xs-2-4 shopee-search-item-result__item' and @data-sqe='item'][.//a[@data-sqe='link' and @href='" + all_urls[i] +"']]"))).click()
time.sleep(0.8)
driver.back()
except:
print(all_urls[i] + "doesn't work")
continue
j+=1
clickpy()
有什么我可以做的吗?
解决方案
您正在解析 HTML 页面,因此所有字符都将采用URL encoded UTF-8
格式。您可以使用urllib.parse.unquote
(对于 Python 3)来解码这些 URL 转义字符。
https://shopee.com.my/LOVOC-%C2%AE-OORIGINAL-Nordic-Sweden-Single-Mattress-(Comfort-in-a-box-Latex-Pocket-Spring-Sweden-Technology-Mattress)-i.132909000.2627476005
®
以上是您的示例
中包含的 URL 之一。%C2%AE
是 的编码格式®
。
代码:
import urllib.parse
url = 'https://shopee.com.my/LOVOC-%C2%AE-OORIGINAL-Nordic-Sweden-Single-Mattress-(Comfort-in-a-box-Latex-Pocket-Spring-Sweden-Technology-Mattress)-i.132909000.2627476005'
url_decoded = urllib.parse.unquote(url)
print(url_decoded)
输出:
https://shopee.com.my/LOVOC-®-OORIGINAL-Nordic-Sweden-Single-Mattress-(Comfort-in-a-box-Latex-Pocket-Spring-Sweden-Technology-Mattress)-i.132909000.2627476005
希望这对你有帮助!
推荐阅读
- c# - 如何在 Visual Studio 中创建和部署 Mono ASP.NET Web 项目?
- java - 如何在 Android Studio 中存储巨大的字符串数据?
- rest-assured - 如果在一段时间后没有收到响应,如何终止 Rest Assured 连接
- python - ImportError: libhdf5.so.101: cannot open shared object file: No such file or directory,
- windows - 我应该怎么做才能在 Git 上解决这个问题?
- regex - 使用 AWK 跨非连续行匹配字符串
- css - 如何解决关于图像上的文本的问题
- java - 在运行时定期从属性文件中读取属性
- titanium - Titanium ellipsize 属性不适用于 iOS 中的 Label
- python - Python数学模块运算符优先级不正确?