python - 删除使用 Selenium 抓取时返回的部分字符串

问题描述

在某些信息通过后，我在 Selenium 中编写了代码来抓取Accor 的预订网站。我可以使用此代码在结果页面上抓取并返回所有酒店的名称。

url = 'https://all.accor.com/ssr/app/accor/hotels/london/index.en.shtml?dateIn=2021-08-20&nights=8&compositions=1&stayplus=false'
driver = webdriver.Chrome(executable_path='C:\\Users\\conor\\Desktop\\diss\\chromedriver.exe')
driver.get(url)
time.sleep(10)
working = driver.find_elements_by_class_name('hotel__wrapper')
for work in working:
    name = work.find_element_by_class_name('title__link').text
    name = name.strip()
    print(name)

这将按预期返回页面上的所有酒店名称，但是，它还返回一个带有每个酒店名称的额外行，以及酒店的星级，我在页面上的 HTML 标记中没有看到。这是输出。

Sofitel London St James
5 Star rating
The Savoy
5 Star rating
Mercure London Bloomsbury Hotel
4 Star rating
Novotel London Waterloo
4 Star rating
ibis London Blackfriars
3 Star rating
Novotel London Blackfriars
4 Star rating
Mercure London Bridge
4 Star rating
Novotel London Bridge
4 Star rating
ibis Styles London Southwark - near Borough Market
3 Star rating
Pullman London St Pancras
4 Star rating

有没有办法删除与酒店名称一起返回的评级的额外文本行？因为我只想要酒店名称，因为我使用这些名称来比较不同网站的价格。任何帮助表示赞赏，谢谢。

标签： pythonseleniumweb-scrapingselenium-chromedriver

由于您有两个字符串，一个带有名称，另一个带有评级，您可以拆分字符串并且只能使用酒店名称部分。这是示例：

for work in working:
    name_with_rating = work.find_element_by_class_name('title__link').text
    name = name_with_rating.split("\n")[0]
    print(name)

python - 删除使用 Selenium 抓取时返回的部分字符串

问题描述

解决方案

推荐阅读