python - Python网页抓取并保存到熊猫数据框
问题描述
我正在尝试在 remax 页面上抓取房屋列表并将该信息保存到 Pandas 数据框。但由于某种原因,它一直给我 KeyError。这是我的代码:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.remax.ca/ab/calgary-real-estate/720-37-st-nw-wp_id251536557-lst'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
detail_title = soup.find_all(class_='detail-title')
details_t = pd.DataFrame(detail_title)
这是我得到的错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-3be49b8e4cfc> in <module>
6 soup = BeautifulSoup(response.text, 'html.parser')
7 detail_title = soup.find_all(class_='detail-title')
----> 8 details_t = pd.DataFrame(detail_title)
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
449 else:
450 mgr = init_ndarray(data, index, columns, dtype=dtype,
--> 451 copy=copy)
452 else:
453 mgr = init_dict({}, index, columns, dtype=dtype)
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
144 # by definition an array here
145 # the dtypes will be coerced to a single dtype
--> 146 values = prep_ndarray(values, copy=copy)
147
148 if dtype is not None:
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in prep_ndarray(values, copy)
228 try:
229 if is_list_like(values[0]) or hasattr(values[0], 'len'):
--> 230 values = np.array([convert(v) for v in values])
231 elif isinstance(values[0], np.ndarray) and values[0].ndim == 0:
232 # GH#21861
~/anaconda3/lib/python3.7/site-packages/bs4/element.py in __getitem__(self, key)
1014 """tag[key] returns the value of the 'key' attribute for the tag,
1015 and throws an exception if it's not there."""
-> 1016 return self.attrs[key]
1017
1018 def __iter__(self):
KeyError: 0
任何帮助将不胜感激!
解决方案
你可以试试这个。我假设您只需要<span>
标签中的文本。但是请随时从我的工作示例中进行调整。
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.remax.ca/ab/calgary-real-estate/720-37-st-nw-wp_id251536557-lst'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
detail_title = soup.find_all(class_='detail-title')
ls = []
for _ in detail_title:
ls.append(_.text)
df = pd.DataFrame(data=ls)
print(df)
输出
0
0 Property Type:
1 Property Tax:
2 Last Updated:
3 Property Sub Type:
4 MLS® #:
5 Ownership-Type:
6 Year Built:
7 sqft:
8 Date Listed:
9 Lot Size:
10 Occupancy:
11 Subdivision:
12 Heating:
13 Heating Source:
14 Full Bathrooms:
15 Half Bathrooms:
16 Rooms:
17 Basement:
18 Basement Development:
19 Flooring:
20 Parking Spaces:
21 Parking:
22 Area:
23 Exterior:
24 Foundation:
25 Roof:
26 Faces:
27 Miscellaneous Features:
28 Lot Description:
29 Condo:
30 Board ID:
31 Suite:
32 Features:
编辑:
print(type(detail_title))
给出<class 'bs4.element.ResultSet'>
,它不是一个可接受的数据类型。来自https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
数据:ndarray(结构化或同构)、Iterable、dict 或 DataFrame
推荐阅读
- angular - 如何在相同的输入表单中使用 mat-autocomplete 和 ng-model
- javascript - DynamoDB 使用扫描读取峰值
- python - 如何在python中模拟传递函数的一步
- javascript - 如何在老虎机 jQuery 上无限旋转并在执行某些操作后停止它?
- vb.net - 调试时字体大小不同,2个标签有些大小,但调试时不同
- java - 我不能用休眠更新我的数据库表?
- java - 尝试在片段中的空对象引用上调用虚拟方法“java.lang.String android.os.Bundle.getString(java.lang.String)”
- spring - 使用 SpringBootTest 的 TestContainers 中的 DB 容器出现问题
- css - 如何在 PrimeNG 中叠加 ProgressSpinner?
- ios - 将项目添加到部分时,UICollectionView 显示项目不正确(Swift)