python - KeyError 0 开启标签
问题描述
我正在尝试解析 HTML 站点,但我有 KeyError。
这是代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"
#PL: otwiera połączenie z wybraną stroną, pobieranie zawartości strony (urllib)
#EN: Opens a connection and grabs url
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing (BeautifulSoup)
page_soup = soup(page_html, "html.parser") #html.parser -> zapisujemy do html, nie np. do xml
#PL: zbiera tabelkę z numerami ofert, kuchnią i innymi danymi o nieruchomości z tabelki
#EN: grabs the data about real estate like kitchen, offer no, etc.
containers = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})
# print(len(containers)) - len(containers) sprawdza ile takich obiektów istnieje na stronie
#PL: Co prawda na stronie jest tylko jedna taka tabelka, ale dla dobra nauki zrobię tak jak gdyby tabelek było wiele.
#EN: There is only one table, but for the sake of knowledge I do the container variable
container = containers[0]
print(len(container.dl))
print(container.dl[0])
这是显示错误的日志。
runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')
36
Traceback (most recent call last):
File "<ipython-input-70-e826e21c585a>", line 1, in <module>
runfile('/home/bartosz/Pulpit/web_scrap.py', wdir='/home/bartosz/Pulpit')
File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/home/bartosz/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/bartosz/Pulpit/web_scrap.py", line 30, in <module>
print(container.dl[0])
File "/home/bartosz/anaconda3/lib/python3.6/site-packages/bs4/element.py", line 1011, in __getitem__
return self.attrs[key]
KeyError: 0
len(container.dl) 显示 dl 中有 36 个。如果我执行 len(container.dl.dt),它会显示:1。
解决方案
您需要访问元素的内容,而不是通过直接索引,而是通过.contents
属性:
print(container.dl.contents[0])
应该管用。
通过直接索引,您可以访问标签的属性,例如。如果是,<dl class="myclass">
那么dl['class']
将打印myclass
.
编辑:
打印所有内容container.dl
:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "http://www.kontrakt.szczecin.pl/mieszkanie-sprzedaz-6664m2-339600pln-potulicka-nowe-miasto-szczecin-zachodniopomorskie,351165"
with uReq(my_url) as uClient:
page_soup = soup(uClient.read(), "html.parser")
container = page_soup.findAll("section",{"class":"clearfix"},{"id":"quick-summary"})[0]
print(len(container.dl))
print('-' * 80)
for content in container.dl.contents:
print(content)
print('-' * 80)
打印(第一行的长度为container.dl.contents
):
36
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
<dt>Numer oferty</dt>
--------------------------------------------------------------------------------
<dd>351165</dd>
--------------------------------------------------------------------------------
<dt>Liczba pokoi</dt>
--------------------------------------------------------------------------------
<dd>4</dd>
--------------------------------------------------------------------------------
<dt>Cena</dt>
--------------------------------------------------------------------------------
<dd><span class="tag price">339 600 PLN</span></dd>
--------------------------------------------------------------------------------
<dt>Cena za m2</dt>
--------------------------------------------------------------------------------
<dd>5 096 PLN</dd>
--------------------------------------------------------------------------------
<dt>Powierzchnia</dt>
--------------------------------------------------------------------------------
<dd>66,64 m2</dd>
--------------------------------------------------------------------------------
<dt>Piętro</dt>
--------------------------------------------------------------------------------
<dd>1</dd>
--------------------------------------------------------------------------------
<dt>Liczba pięter</dt>
--------------------------------------------------------------------------------
<dd>6</dd>
--------------------------------------------------------------------------------
<dt>Typ kuchni</dt>
--------------------------------------------------------------------------------
<dd>Aneks</dd>
--------------------------------------------------------------------------------
<dt>Balkon</dt>
--------------------------------------------------------------------------------
<dd>Tak</dd>
--------------------------------------------------------------------------------
<dt>Rodzaj ogrzewania</dt>
--------------------------------------------------------------------------------
<dd>CO miejskie</dd>
--------------------------------------------------------------------------------
<dt>Gorąca woda</dt>
--------------------------------------------------------------------------------
<dd>Wodociąg miejski</dd>
--------------------------------------------------------------------------------
<dt>Rodzaj budynku</dt>
--------------------------------------------------------------------------------
<dd>Wysoki blok</dd>
--------------------------------------------------------------------------------
<dt>Materiał</dt>
--------------------------------------------------------------------------------
<dd>Silikat</dd>
--------------------------------------------------------------------------------
<dt>Rok budowy</dt>
--------------------------------------------------------------------------------
<dd>2019</dd>
--------------------------------------------------------------------------------
<dt>Winda</dt>
--------------------------------------------------------------------------------
<dd>Tak</dd>
--------------------------------------------------------------------------------
<dt>Stan nieruchomości</dt>
--------------------------------------------------------------------------------
<dd>Stan deweloperski</dd>
--------------------------------------------------------------------------------
<dt>Rynek</dt>
--------------------------------------------------------------------------------
<dd>Pierwotny</dd>
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
推荐阅读
- pygame - 树莓派上未显示徽标
- ios - 如何在某个时间线后更改 UITextView 的输入?例如,每 24 小时?迅速
- android - 从导航的深层链接调用我的应用程序会导致我的片段被多次实例化
- canvas - 使用 Fabric.js 中的自定义对象选择功能进行缩放和平移时鼠标偏移的奇怪问题
- django - django manytomanyfield 过滤器
- laravel - Laravel 刀片模板一次回显变量并且一次没有错误
- rrdtool - 使用rrdtool RRD PDP或RRA合并函数计算平均读数的区别?
- wpf - 只用角落和空间制作边框
- c# - Xamarin 表单中的可编辑分组列表视图
- string - 字符前的字符串拆分