python - 我正在尝试从 find_all 返回地址
问题描述
我正在尝试使用 Python 和 Beautiful Soup 进行网络抓取。参考网址 = https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home
这是我设法达到的程度:
>>>address = container.find_all("span")
>>>print(address)
[<span class="price-modifier">Guide price</span>, <span class="listing-results-just-added">Just added</span>, <span><a class="listing-results-address" href="/for-sale/details/50074267">Wolseley Road, Crouch End, London N8</a></span>, <span class="interface nearby_stations_schools_national_rail_station" title="Hornsey"></span>, <span class="nearby_stations_schools_name" title="Hornsey">Hornsey</span>, <span class="interface nearby_stations_schools_national_rail_station" title="Crouch Hill"></span>, <span class="nearby_stations_schools_name" title="Crouch Hill">Crouch Hill</span>]
为什么以下不起作用?
address = container.find_all("span", attrs={"class": "listing-results-address"})
我正在尝试仅获取地址, 即Wolseley Road, Crouch End, London N8
解决方案
您应该搜索<a>
标签,而不是<span>
标签:
import requests
from bs4 import BeautifulSoup
url = 'https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.find_all('a', class_='listing-results-address'):
print(a.get_text(strip=True))
印刷:
Linstead Way, London SW18
Tudor Court, London E17
Pendlestone Road, Walthamstow, ...
Discovery House, Juniper Drive, Wandsworth, London SW18
Woodlea Grove, Northwood HA6
Elsham Road, London W14
Lytham Street, London SE17
Isleworth, London TW7
Islip Manor Road, Northolt UB5
Teignmouth Road, Welling, Kent DA16
Wimpole Street, London W1G
Cranborne Crescent, Potters Bar, Herts EN6
Forest Road, London E17
Highclere Road, New Malden KT3
Coppermill Lane, London E17
Diana Road, London E17
Chiswick High Road, London W4
Holmesdale Road, London SE25
Warrington Crescent, London W9
Grasmere Road, Purley CR8
Bonar Place, Chislehurst BR7
Samos Road, London SE20
Tredegar Road, London E3
Widdenham Road, Islington, London N7
Eddystone Road, London SE4
Benhurst Avenue, Hornchurch RM12
Woodfield Gardens, New Malden KT3
Old Road, London SE13
编辑:要沿地址获取价格,您可以执行以下操作:
import requests
from bs4 import BeautifulSoup
url = 'https://www.zoopla.co.uk/for-sale/property/london/?q=London&results_sort=newest_listings&search_source=home'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.find_all('a', class_='listing-results-address'):
price = a.find_previous(class_='listing-results-price').find(text=True).strip()
print('{:<15} {}'.format(price, a.get_text(strip=True)))
印刷:
£1,500,000 Brondesbury Park, Brondesbury ...
£460,000 2D Harold Road, Upper Norwood SE19
£450,000 Anerley Road, London SE20
£450,000 Grange Road, London SE19
£225,000 Bath Road, Harlington, Hayes UB3
£440,000 George Beard Road, London SE8
£615,000 Cumberland Drive, Chessington KT9
£800,000 Woodmansterne Road, Carshalton SM5
£600,000 Willow Close, Bexley DA5
£165,000 Essex Road, Islington On The Green, Islington, London N1
£695,000 Advance House, 101 Ladbroke Grove, London W11
£1,500,000 Riverview Gardens, London SW13
£350,000 Church Road, London SE19
£935,000 Ansdell Road, Nunhead SE15
£350,000 Marlborough Close, London SE17
£380,000 Graveney Road, London SW17
£360,000 Violet Lane, Croydon CR0
£325,000 Montana Gardens, Sutton SM1
£550,000 Albert Road, Bromley, Kent BR2
£365,000 Hadleigh Walk, London E6
£650,000 Eton Rise, Eton College Road, London NW3
£480,000 Russell Road, London N13
£500,000 Heligan House, Watergarden Square, Canada Water SE16
£1,850,000 Melrose Gardens, Brook Green, London W6
£475,000 Cowper Close, Welling DA16
£4,950,000 Edwardes Square, London W8
£735,000 Arbuthnot Road, New Cross SE14
£750,000 Gosterwood Street, London SE8
推荐阅读
- webpack - 生产构建没有发生,也没有给出任何错误
- python - 使用 Python“cPickle.load”加载 C++ 编写的二进制文件时出现 EOFError
- python - 如何在 python 中将 wrap_strategy 用于谷歌表格?
- android - 代号未显示本地通知
- r - 条件字符列 ffdf 数据
- sql - how to select the info in one raw table
- java - 我的 ArrayList 被覆盖,我不清楚为什么?每次都有一个新的“添加”。使它将所有以前的条目更改为此
- sql-server - 在 MSSQL Server 中从名字生成数据到首字母
- powershell - 我正在尝试将 Exchange 2016 on prem 中的默认日历权限设置为 AvailabilityOnly
- javascript - How can I save images in a folder with javascript? ASP.NET