python - 如何抓取相同的类名数据
问题描述
我试图抓取一些房地产网站,但我遇到的一个 div 在一个 div 下具有相同的类名,并且该 div 还有另外 2 个具有相同类名的 div。我想抓取子类数据(我认为)。
我想抓取以下类数据:
<div class="m-srp-card__summary__info">New Property</div>
下面是我试图抓取的整个代码块:
<div class="m-srp-card__collapse js-collapse" aria-collapsed="collapsed" data-container="srp-card-
summary">
<div class="m-srp-card__summary js-collapse__content" data-content="srp-card-summary">
<input type="hidden" id="propertyArea42679361" value="888 sqft">
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">carpet area</div>
<div class="m-srp-card__summary__info">888 sqft</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">status</div>
<div class="m-srp-card__summary__info">Ready to Move</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">floor</div>
<div class="m-srp-card__summary__info">9 out of 13 floors</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">transaction</div>
<div class="m-srp-card__summary__info">New Property</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">furnishing</div>
<div class="m-srp-card__summary__info">Unfurnished</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">facing</div>
<div class="m-srp-card__summary__info">South -West</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">overlooking</div>
<div class="m-srp-card__summary__info">Garden/Park, Main Road</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">society</div>
<div class="m-srp-card__summary__info">
<a id="project-link-42679361" class="m-srp-card__summary__link"
href="https://www.magicbricks.com/skylights-bopal-ahmedabad-pdpid-4d4235303936323633"
target="_blank">Skylights</a>
</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">car parking</div>
<div class="m-srp-card__summary__info">1 Covered</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">bathroom</div>
<div class="m-srp-card__summary__info">3</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">balcony</div>
<div class="m-srp-card__summary__info">2</div>
</div>
<div class="m-srp-card__summary__item">
<div class="m-srp-card__summary__title">ownership</div>
<div class="m-srp-card__summary__info">Co-operative Society</div>
</div>
</div>
<div class="m-srp-card__collapse__control js-collapse__control" data-toggle="list-collapse"
data-target="srp-card-summary" onclick="stopPage=true;">
<div class="ico m-srp-card__ico">
<svg role="icon">
<use xlink:href="#icon-caret-down"></use>
</svg>
</div>
我尝试了索引但一无所获。
下面是我的代码:
req = Request('https://www.magicbricks.com/property-for-sale/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa&Locality=Bopal&cityName=Ahmedabad', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(req, 'html.parser')
containers = soup.find_all('div', {'class': 'm-srp-card__desc flex__item'})
container = containers[0]
no_apartment = container.find('h3').find('span', {'class': 'm-srp-card__title__bhk'}).getText()
c_area = container.find('div', {'class': 'm-srp-card__summary__info'}).getText()
p_price = container.find('div', {'class': 'm-srp-card__info flex__item'})
p_type = container.find('div', {'class': 'm-srp-card__summary js-collapse__content'})[3].find('div', {'class': 'm-srp-card__summary__info'})
提前致谢!
解决方案
import requests
from bs4 import BeautifulSoup
import csv
import re
r = requests.get('https://www.magicbricks.com/property-for-sale/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa&Locality=Bopal&cityName=Ahmedabad')
soup = BeautifulSoup(r.text, 'html.parser')
category = []
size = []
price = []
floor = []
for item in soup.findAll('span', {'class': 'm-srp-card__title__bhk'}):
category.append(item.get_text(strip=True))
for item in soup.findAll(text=re.compile('area$')):
size.append(item.find_next('div').text)
for item in soup.findAll('span', {'class': 'm-srp-card__price'}):
price.append(item.text)
for item in soup.findAll(text='floor'):
floor.append(item.find_next('div').text)
data = []
for items in zip(category, size, price, floor):
data.append(items)
with open('output.csv', 'w+', newline='', encoding='UTF-8-SIG') as file:
writer = csv.writer(file)
writer.writerow(['Category', 'Size', 'Price', 'Floor'])
writer.writerows(data)
print("Operation Completed")
在线查看输出:点击这里
推荐阅读
- python-3.x - 数据流作业不产生任何输出
- laravel - Laravel 刀片模板中的条件结构不起作用
- sql - 3张表一一匹配,得到最终列值
- python - NumPy 就地重塑数组
- flutter - 如何在flutter中从sqlite获取最后一个条目?
- laravel - 如何在 laravel 中的 DB select 和 DB raw 之后获取值
- android - 在 Repository 类中观察 Forever 是一个好习惯吗?db+network 分页列表
- javascript - 从 iframe 中触发子 iframe 的关闭事件,但不存储/传递 iframe 元素 ID?
- python - 使用来自 API 的值的异步更新接口
- eclipse - Eclipse 市场 - 无法安装 asciidoc 编辑器