python - .text.strip() 使用 Selenium BeautifulSoup 进行 Web 抓取时出错(AttributeError: 'NoneType' object has no attribute 'text)
问题描述
我想从网页上抓取价格。首先,在将价格完全合并到一个代码中之前,我已经逐块编写了价格代码。当我按块编写它时效果很好。(尤其是使用时的价格部分.text.strip()
!pip install selenium
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome('D:\chromedriver.exe')
url = "https://www.fashionvalet.com/catalogsearch/result/?q=duck"
driver.get(url)
driver.maximize_window()
time.sleep(3)
btn = driver.find_element_by_xpath('/html/body/main/div/header/div[5]/div[1]/div[1]/div')
btn.click()
time.sleep(5)
soup = BeautifulSoup(driver.page_source, "html.parser")
p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
#"strong").select_one("strong").text.strip()
print(p_price)
MYR50.00
不幸的是,当我合并所有代码时,错误来自.text.strip()
价格部分,
!pip install selenium
from selenium import webdriver
import time
import pandas as pd
from bs4 import BeautifulSoup
def get_url(product_name):
product_name = product_name.replace(' ', '+')
url_template = "https://www.fashionvalet.com/catalogsearch/result/?q={}"
url = url_template.format(product_name)
return url
def product_info(card):
# name
p_name = card.find('h3').text.strip()
# price
#p_rice = card.find("p", "fvPLPProductPrice").select("strong")
p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
# image
p_image = card.find('img')
p_img = p_image['src']
# brand
p_brand = card.find('p', "fvPLPProductBrand").text.strip()
# discount percent
p_dis = card.find('p', "fvPLPProductMeta").text.strip()
info = (p_name, p_price, p_img, p_brand, p_dis)
return info
def main(product):
records = []
url = get_url(product) # 1--generate URL
driver = webdriver.Chrome('D:\chromedriver.exe') # 2--open browser
driver.get(url) # 3--open URL
driver.maximize_window()
time.sleep(5)
# BUTTON
btn = driver.find_element_by_xpath('/html/body/main/div/header/div[5]/div[1]/div[1]/div')
btn.click()
time.sleep(5)
# AUTO-SCROLLING
# -- make the parsing time of python is equivalent to the web
temp_height=0
while True:
driver.execute_script("window.scrollBy(0,1000)")
time.sleep(10)
check_height = driver.execute_script("return document.documentElement.scrollTop || window.pageYOffset || document.body.scrollTop;")
if check_height==temp_height:
break
temp_height=check_height
time.sleep(5)
# AUTO-SCROLL end
soup = BeautifulSoup(driver.page_source, "html.parser")
product_card = soup.select('.fvPLPProducts > li')
for allproduct in product_card:
productDetails = product_info(allproduct)
records.append(productDetails)
col = ['Name', 'Price', 'Image', 'Brand', 'Discount']
all_data = pd.DataFrame(records, columns=col)
all_data.to_csv('D:\\FASHION-{}.csv'.format(product))
这是输出,在我运行main("duck")
错误后出现这样的错误,
AttributeError Traceback (most recent call last)
<ipython-input-7-7b75c58eb0da> in <module>
----> 1 main("duck")
<ipython-input-6-7d068e5049f6> in main(product)
70
71 for allproduct in product_card:
---> 72 productDetails = product_info(allproduct)
73 records.append(productDetails)
74
<ipython-input-6-7d068e5049f6> in product_info(card)
20
21 #p_rice = card.find("p", "fvPLPProductPrice").select("strong")
---> 22 p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
23
24 # image
AttributeError: 'NoneType' object has no attribute 'text
`
我试图删除text.strip()
,它运行良好,但输出包含来自 HTML 代码的标签,这不是我想要的。
作为结论,.text.strip()
当分离代码时它是有效的,但是当我将它全部合并时它变成了错误。
任何人都可以帮助我吗?谢谢你。
解决方案
推荐阅读
- android - Google 64-bit requirement only works for some apps
- javascript - 需要在 Azuredevops 中使用 build.yaml 构建 javascript 应用程序
- python - 导入 CSV 和随机输出行 6 次
- javascript - 旧输入数组和动态字段 - Laravel - Blade
- wso2 - 在 Windows 10 上构建 WSO2 Microgateway 项目时出错
- google-apps-script - 如何从 Google 表单提交中触发表格脚本?
- connection - Pentaho“使转换数据库事务化”加上提交频率
- algorithm - 计算三种不同排列中相同有序对的数量
- python - 张量流(CPU)中的并行推理
- python - 如何测试单例 __del__() 方法?