首页 > 解决方案 > Python - 如何在给定页面内使用位置抓取文本

问题描述

我正在尝试从下面代码中给出的 url 中获取“ Katowice, Brynów-Zgrzebnioka, Brynów ”

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")

print(page_soup.find("a", {"href":"#map"}).text)    

到目前为止,我可以到达

.css-14dmk7z-Le{margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}.css-1g0gx4e-Le{vertical-align:middle;fill:currentColor;margin-right:2px;width:15px;height:15px;padding-bottom:2px;color:#ff7200;}Katowice, Brynów-Zgrzebnioka, Brynów

我不知道如何进一步进行,任何帮助将不胜感激

标签: python-3.xweb-scraping

解决方案


不确定这是否会 100% 解决您的问题,但这是我的解决方案

import bs4
from urllib.request import urlopen as Open
from urllib.request import Request
from bs4 import BeautifulSoup as soup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
my_url = "https://www.otodom.pl/oferta/narozne-2-pokoje-nowa-inwestycja-0-ID43FH9.html"
req = Request(url=my_url, headers=headers) 
html = Open(req).read() 

page_soup = soup(html, "html.parser")
texts = page_soup.findAll(text=True)

print(texts[91])

推荐阅读