首页 > 解决方案 > 我正在尝试从 find_all 返回地址

问题描述

我正在尝试使用 Python 和 Beautiful Soup 进行网络抓取。参考网址 = https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home

这是我设法达到的程度:

>>>address = container.find_all("span")
>>>print(address)
[<span class="price-modifier">Guide price</span>, <span class="listing-results-just-added">Just added</span>, <span><a class="listing-results-address" href="/for-sale/details/50074267">Wolseley Road, Crouch End, London N8</a></span>, <span class="interface nearby_stations_schools_national_rail_station" title="Hornsey"></span>, <span class="nearby_stations_schools_name" title="Hornsey">Hornsey</span>, <span class="interface nearby_stations_schools_national_rail_station" title="Crouch Hill"></span>, <span class="nearby_stations_schools_name" title="Crouch Hill">Crouch Hill</span>]

为什么以下不起作用?

address = container.find_all("span", attrs={"class": "listing-results-address"})

我正在尝试仅获取地址, Wolseley Road, Crouch End, London N8

标签: pythonweb-scrapingbeautifulsoup

解决方案


您应该搜索<a>标签,而不是<span>标签:

import requests
from bs4 import BeautifulSoup


url = 'https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for a in soup.find_all('a', class_='listing-results-address'):
    print(a.get_text(strip=True))

印刷:

Linstead Way, London SW18
Tudor Court, London E17
Pendlestone Road, Walthamstow, ...
Discovery House, Juniper Drive, Wandsworth, London SW18
Woodlea Grove, Northwood HA6
Elsham Road, London W14
Lytham Street, London SE17
Isleworth, London TW7
Islip Manor Road, Northolt UB5
Teignmouth Road, Welling, Kent DA16
Wimpole Street, London W1G
Cranborne Crescent, Potters Bar, Herts EN6
Forest Road, London E17
Highclere Road, New Malden KT3
Coppermill Lane, London E17
Diana Road, London E17
Chiswick High Road, London W4
Holmesdale Road, London SE25
Warrington Crescent, London W9
Grasmere Road, Purley CR8
Bonar Place, Chislehurst BR7
Samos Road, London SE20
Tredegar Road, London E3
Widdenham Road, Islington, London N7
Eddystone Road, London SE4
Benhurst Avenue, Hornchurch RM12
Woodfield Gardens, New Malden KT3
Old Road, London SE13

编辑:要沿地址获取价格,您可以执行以下操作:

import requests
from bs4 import BeautifulSoup


url = 'https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for a in soup.find_all('a', class_='listing-results-address'):
    price = a.find_previous(class_='listing-results-price').find(text=True).strip()
    print('{:<15} {}'.format(price, a.get_text(strip=True)))

印刷:

£1,500,000      Brondesbury Park, Brondesbury ...
£460,000        2D Harold Road, Upper Norwood SE19
£450,000        Anerley Road, London SE20
£450,000        Grange Road, London SE19
£225,000        Bath Road, Harlington, Hayes UB3
£440,000        George Beard Road, London SE8
£615,000        Cumberland Drive, Chessington KT9
£800,000        Woodmansterne Road, Carshalton SM5
£600,000        Willow Close, Bexley DA5
£165,000        Essex Road, Islington On The Green, Islington, London N1
£695,000        Advance House, 101 Ladbroke Grove, London W11
£1,500,000      Riverview Gardens, London SW13
£350,000        Church Road, London SE19
£935,000        Ansdell Road, Nunhead SE15
£350,000        Marlborough Close, London SE17
£380,000        Graveney Road, London SW17
£360,000        Violet Lane, Croydon CR0
£325,000        Montana Gardens, Sutton SM1
£550,000        Albert Road, Bromley, Kent BR2
£365,000        Hadleigh Walk, London E6
£650,000        Eton Rise, Eton College Road, London NW3
£480,000        Russell Road, London N13
£500,000        Heligan House, Watergarden Square, Canada Water SE16
£1,850,000      Melrose Gardens, Brook Green, London W6
£475,000        Cowper Close, Welling DA16
£4,950,000      Edwardes Square, London W8
£735,000        Arbuthnot Road, New Cross SE14
£750,000        Gosterwood Street, London SE8

推荐阅读