python - Scrapy:提取数据
问题描述
我有一个蜘蛛可以获取个人资料的链接。但是个人资料页面上不会收集信息。问题是什么?
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747',
'DOWNLOAD_DELAY': 2.5,
}
def start_requests(self):
yield scrapy.Request('https://www.ratemds.com/best-doctors/?specialty=acupuncturist', callback=self.profile_link)
def profile_link(self, response):
for a in response.css('.search-item-doctor-link'):
yield response.follow(a, callback=self.profile)
next_page = response.css('.pagination-sm a::attr(href)')[-1].get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.profile_link)
def profile(self, response):
item = {
'url': response.request.url,
'Image': response.css('.doctor-profile-image::attr(src)').get(),
'First_and_Last_Name': response.css('h1::text').get(),
'Position': response.css('.col-sm-6 .search-item-info~ .search-item-info+ .search-item-info span::text').getall(),
'Reviews': response.css('.star-rating-count span span::text').get(),
'Gender': response.css('.fa-male+ a::text').get(),
'Facilities': response.css('.search-item-extra a span::text').getall(),
}
yield item
输出:
{'url': 'https://www.ratemds.com/doctor-ratings/dr-zach-olesinski-toronto-on-ca', 'Image': None, 'First_and_Last_Name': None, 'Position': [], 'Reviews': None, 'Gender': None, 'Facilities': [], 'Social_Media_link': None, 'Staff': 0, 'Punctuality': 0, 'Helpfulness': 0, 'Knowledge': 0, 'Comments': [], 'Facility_Affiliations': [], 'Accepting_New_Patients': None, 'Languages': [], 'Education': [], 'Other_Specialties': [], 'Areas_of_Expertise': [], 'Awards_Recognitions': [], 'Publications_Research': [], 'Insurance_accepted_by_this_Doctor': []}
解决方案
推荐阅读
- ios - 如何将“任何”类型的原始数据转换为具体类型?
- vba - Selenium VBA Excel 停止浏览器关闭
- python - 如何拆分 CSV 列?
- javascript - MongoDB Realm vs. Stitch 登录时间——为什么 Realm 这么慢?
- php - Woocommerce 使用自定义字段以编程方式更新权重和属性
- r - 根据之前的 selectInput flexdashboard 更新 selectInput
- python - 为什么 pandas dataframe describe().min 方法会返回标准差?
- javascript - 应用程序脚本与换行符连接
- javascript - 在 Qualtrics 中使用 JavaScript 中的 If/Else 语句设置嵌入数据
- python - 如何在字符串中插入随机字符