首页 > 解决方案 > 为什么beautifulSoup 找不到text 参数传入的文本?

问题描述

这是我试图提取运费的 URL:

url = "https://www.amazon.com/AmazonBasics-Ultra-Soft-Micromink-Sherpa-Blanket/dp/B0843ZJGNP/ref=sr_1_1_sspa?dchild=1&keywords=amazonbasics&pd_rd_r=5cb1aaf8-d692-4abf-9131-ebd533ad5763&pd_rd_w=8Uw69&pd_rd_wg=kTKEB&pf_rd_p=9349ffb9-3aaa-476f-8532-6a4a5c3da3e7&pf_rd_r=PYFBYA98FS6B8BR7TGJD&qid=1623412994&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzM0xaSFIzVzFTUUpMJmVuY3J5cHRlZElkPUEwNzk3MjgzM1NQRlFQQkc4VFJGWSZlbmNyeXB0ZWRBZElkPUEwNzU1NzM0M0VMQ1hTNDJFTzYxQyZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU="

我的代码是:

r = requests.get(url,headers=HEADERS,proxies=proxyDict)
soup = BeautifulSoup(r.content,'html.parser')
needle="$93.63" 
#I also tried complete sentences
#"$93.63 Shipping & Import Fees Deposit to India"
#"$93.63 Shipping & Import Fees Deposit to India"

print(soup.find_all(text=needle)) 
#I also tried print(soup.find_all(text=re.compile(needle)))

但这总是返回一个空列表。我可以在检查元素中看到所需的文本以及我在控制台上打印的下载汤。但是,当我对实际产品价格(27.99 美元)做同样的事情时,soup.find_all()可以按预期工作。到目前为止,我还无法找出这里的问题。抱歉有任何愚蠢的错误。

标签: pythonbeautifulsoupamazon

解决方案


搜索字段,而不是值。

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/AmazonBasics-Ultra-Soft-Micromink-Sherpa-Blanket/dp/B0843ZJGNP/ref=sr_1_1_sspa?dchild=1&keywords=amazonbasics&pd_rd_r=5cb1aaf8-d692-4abf-9131-ebd533ad5763&pd_rd_w=8Uw69&pd_rd_wg=kTKEB&pf_rd_p=9349ffb9-3aaa-476f-8532-6a4a5c3da3e7&pf_rd_r=PYFBYA98FS6B8BR7TGJD&qid=1623412994&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzM0xaSFIzVzFTUUpMJmVuY3J5cHRlZElkPUEwNzk3MjgzM1NQRlFQQkc4VFJGWSZlbmNyeXB0ZWRBZElkPUEwNzU1NzM0M0VMQ1hTNDJFTzYxQyZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU="

HEADERS = ({'User-Agent':
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
            'Accept-Language': 'en-US, en;q=0.5'})

r = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(r.content,'html.parser')

value = soup.find("span", {"id" : "priceblock_ourprice"}).contents

print(value)

推荐阅读