首页 > 解决方案 > 尽管调用了类,但无法从 div 获取 href

问题描述

我正在尝试获取本网站中所有产品的链接:https ://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers

例如,对于Google Home Mini Chalk,我应该得到https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe

但是,我什至无法进入 href 链接之前的 div 类。我试过不同的代码,都用bs4。这是我确定会起作用但没有起作用的两个代码:

第一个代码

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])

第二个代码

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]

我相信我没有打电话给正确的班级,但我无法弄清楚它是什么。提前致谢!

标签: python-3.xweb-scrapingbeautifulsouphref

解决方案


之所以没有得到预期的输出是因为页面是通过 加载的JavaScript,因此您将无法提取预期的输出,直到您render使用JS.

所以你可以使用Selenium,但我不推荐它,因为它会减慢你的任务。

或者使用HTMLSessionfromrequests_html动态渲染它。

否则,让我们只使用JS从它渲染的原点API

通过under for等跟踪XHR请求后。Network-TabBrowser Developer tools CTRL SHIFT EFireFox

所以在这里我们可以调用:

import requests

json = {"requests": [{"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&facets=%5B%22rangedOnline%22%2C%22forestProductSchemeName%22%2C%22hardDriveType%22%2C%22bagStyle%22%2C%22socketType%22%2C%22fullSizeInnerDimensions%22%2C%22stapleSize%22%2C%22connectivity%22%2C%22smartHomeCompatibility%22%2C%22industryType%22%2C%22sizeCapacity%22%2C%22performancePrintResolution%22%2C%22handsetIncludedHandsets%22%2C%22usbFlashLidType%22%2C%22videoResolution%22%2C%22maximumPunchingCapacity%22%2C%22rangedRetail%22%2C%22protectionType%22%2C%22rulerLength%22%2C%22sizeNumber%22%2C%22deviceConnectivityTechnology%22%2C%22unitsOfMeasure%22%2C%22selfAdhesive%22%2C%22interfaceHardDrive%22%2C%22sharpenerSize%22%2C%22connectivityWifiBands%22%2C%22microphoneType%22%2C%22labellerKeyboardLayout%22%2C%22numberOfUsb30Ports%22%2C%22operatingSystemEdition%22%2C%22ringRingSize%22%2C%22performanceHealthMonitoringFunctions%22%2C%22connectivityTechnology%22%2C%22dualSimCompatible%22%2C%22audioSource%22%2C%22totalNumberOfLabels%22%2C%22brushShape%22%2C%22maxProcessorClockSpeed%22%2C%22operatingHand%22%2C%22powerBatteryTechnology%22%2C%22travelRegion%22%2C%22capacityBinder%22%2C%22licenceValidityPeriod%22%2C%22storageHardDriveCapacity%22%2C%22spineSize%22%2C%22rollLength%22%2C%22numberOfRings%22%2C%22lightBulbType%22%2C%22colour%22%2C%222SidedCopying%22%2C%22automaticDocumentFeederCapacity%22%2C%22automaticPaperFeed%22%2C%22performanceShredderCutType%22%2C%22performanceBrightness%22%2C%22displayResolution%22%2C%22labellingOfficeUseFacet%22%2C%22securityLevel%22%2C%22maxSupportedDocumentSize%22%2C%22bulkbuyOnline%22%2C%22staplingCapacity%22%2C%22storageIncludedFlashMemory%22%2C%22compatibabilityCustomFitAndroid%22%2C%22drawerNumberOfDrawers%22%2C%22storageInternalMemorySize%22%2C%22ramInstalledSize%22%2C%22100RecycledProduct%22%2C%22placementPlacingMounting%22%2C%22earPlacement%22%2C%22foldedDimensions%22%2C%22portsTotalNumberOfNetworkingPorts%22%2C%22powerBatteryChargeAmpHours%22%2C%22noiseCancelling%22%2C%22surfaceShape%22%2C%22labellingHomeUseFacet%22%2C%22sizeDescription%22%2C%22maxLoadWeight%22%2C%22numberOfPowerPorts%22%2C%22compatibabilityCustomFitApple%22%2C%22tsaApproved%22%2C%22chassisType%22%2C%22surgeSuppression%22%2C%22printingTechnologyPrinters%22%2C%22placementVesaMountCompatibility%22%2C%22boardSizeFacet%22%2C%22frameStyle%22%2C%22serviceProvider%22%2C%22bluetoothCompatibility%22%2C%22scannerType%22%2C%22photoCapacityQuantity%22%2C%22numberOfUsb20Ports%22%2C%22rulingType%22%2C%22learningSkillsFocus%22%2C%22licenceType%22%2C%22connectivityDisplayConnections%22%2C%22performanceMaxThickness%22%2C%22performanceResolution%22%2C%22paperWeightGsm%22%2C%22numberOfProcessorCores%22%2C%22fitsDevice%22%2C%22brushhairtype%22%2C%22opticalZoom%22%2C%22processorClockSpeed%22%2C%22labellingIndustrialUseFacet%22%2C%22performanceApproximateNumberOfImpressions%22%2C%222SidedPrinting%22%2C%22powerPowerType%22%2C%22interfaceType%22%2C%22printerConnectivityTechnology%22%2C%22numberOfReamsPerCarton%22%2C%22baseWheels%22%2C%22performanceEstimatedCartridgeYieldSheets%22%2C%22papersize%22%2C%22processorType%22%2C%22wallStrengthThickness%22%2C%22storageHardDriveCapacityComputingDevices%22%2C%22ciewhiteness%22%2C%22runTime%22%2C%22stampInking%22%2C%22switched%22%2C%22processorManufacturer%22%2C%22deviceCaseCompatibility%22%2C%22caseFeaturesNumberOfCompartments%22%2C%22displaySize%22%2C%222sidedScanning%22%2C%22glutenFree%22%2C%22restTime%22%2C%22operatingPlatformCompatibility%22%2C%22powerSource%22%2C%22touchScreen%22%2C%22displayPanelType%22%2C%22secondaryProcessorType%22%2C%22wastebinCapacityRange%22%2C%22softwareDistributionMedia%22%2C%22learningAgeRange%22%2C%22tapeWidth%22%2C%22storageStorageCapacity%22%2C%22cableLength%22%2C%22skillLevel%22%2C%22flightTime%22%2C%22energyRating%22%2C%22maximumRecommendedDailyUsage%22%2C%22contentLayout%22%2C%22deviceLocation%22%2C%22brand%22%2C%22numberOfUsb31Ports%22%2C%22lidIncluded%22%2C%22scannerScanResolution%22%2C%22portsNumberOfUsbChargePorts%22%2C%22envelopeSize%22%2C%22keyboardCompatibility%22%2C%22primaryCameraVideo%22%2C%22supportedMemoryCards%22%2C%22connectivityDisplayConnectionsPanels%22%2C%22up1Category%22%2C%22price%22%2C%22categorySeoPaths%22%2C%22rangedRetail%22%2C%22rangedOnline%22%2C%22price%22%2C%22brand%22%2C%22colour%22%2C%22audioSource%22%2C%22cableLength%22%2C%22up1Category%22%2C%22bulkbuyOnline%22%2C%22microphoneType%22%2C%22noiseCancelling%22%2C%22bluetoothCompatibility%22%2C%22powerBatteryTechnology%22%2C%22smartHomeCompatibility%22%5D&tagFilters=&facetFilters=%5B%5B%22categorySeoPaths%3Atechnology%2Faudio-speakers%2Fvoice-assistant-speakers%22%5D%5D"}, {"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=false&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=categorySeoPaths"}]}
r = requests.post("https://k535caawve-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react-instantsearch%205.4.0%3B%20JS%20Helper%202.26.1&x-algolia-application-id=K535CAAWVE&x-algolia-api-key=8a831febe0110932cfa06ff0e2024b4f", json=json).json()

for item in r['results'][0]['hits']:
    print("Name: {:<65}, Url: {}".format(
        item['name'], f"https://www.officeworks.com.au/shop/officeworks/p/{item['urlKeyword']}"))

输出:

Name: Google Home Mini Chalk                                           , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe
Name: Google Home Mini Charcoal                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-charcoal-sygminibk
Name: Google Nest Hub Max Charcoal                                     , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-charcoal-sygnhmaxbk
Name: Google Nest Hub Max Chalk                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-chalk-sygnhmaxwe
Name: Google Home                                                      , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-sygghome
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Graphite     , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-graphite-inmblastbk
Name: Google Nest Mini 2nd Generation Charcoal                         , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-charcoal-sygnmini2c
Name: Google Nest Mini 2nd Generation Chalk                            , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-chalk-sygnmini2w
Name: Ultimate Ears Blast Wireless Speaker with Alexa Graphite         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-graphite-imblastbk
Name: Amazon 5.5" Echo Show 5 Charcoal                                 , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-charcoal-syecosh5cl
Name: Amazon Echo 3rd Generation Charcoal                              , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-3rd-generation-charcoal-syaedotclc
Name: JBL Flip Essential Bluetooth Speaker Gun Metal                   , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-flip-essential-bluetooth-speaker-gun-metal-imjblfless
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Blue         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-blue-inmblastbe
Name: Amazon Echo Dot 3rd Gen With Clock Sandstone                     , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-with-clock-sandstone-syaedotcls
Name: Ultimate Ears Megablast Wireless Speaker with Alexa Merlot       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-merlot-inmblastrd
Name: Amazon Echo Dot 3rd Gen Heather Grey                             , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-heather-grey-syamdot3ng
Name: Lenovo Smart Clock E27 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-e27-starter-pack-sylsmcbun2
Name: Amazon 5.5" Echo Show 5 Sandstone                                , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-sandstone-syecosh5ss
Name: Amazon Echo Studio Black                                         , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-studio-black-syastudiob
Name: Lenovo Smart Clock B22 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-b22-starter-pack-sylsmcbun1
Name: JBL Link View Speaker with Google Assistant                      , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-link-view-speaker-with-google-assistant-injblinkvw
Name: Ultimate Ears Blast Wireless Speaker with Alexa Blue Steel       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-blue-steel-imblastbe
Name: LG WK7 ThinQ WiFi/Bluetooth Speaker with Google Assistant        , Url: https://www.officeworks.com.au/shop/officeworks/p/lg-wk7-thinq-wifi-bluetooth-speaker-with-google-assistant-inlgthinkq

推荐阅读