python - 使用 requests-html 从 Python 网页中提取特定元素
问题描述
说我在看这个网页
我想提取指向该医生个人资料的链接,但是当我尝试网络抓取时,即使使用 CSS 选择器,我也找不到该元素。
from requests_html import HTMLSession
firstname = 'robert'
lastname = 'b'
city = 'Palo_Alto'
url = 'https://openpaymentsdata.cms.gov/search/physicians/by-name-and-location?firstname='\
+ firstname + '&lastname=' + lastname + '&city=' + city
session = HTMLSession()
r = session.get(url)
sel = 'body > div.siteOuterWrapper > div.siteInnerWrapper > div.siteContentWrapper'
print(r.html.find(sel, first=True).text)
这一切都有效,直到我到达内容包装器,在那里我再也看不到任何元素。为什么是这样?我看不到这个元素有什么原因吗?一开始我以为是因为 Javascript,但是这个库声称有完整的 javascript 支持https://requests-html.kennethreitz.org/
解决方案
下面的 HTTP 请求应返回您要查找的数据。(在浏览器中执行 F12 > Network > XHR)
HTTP GET https://openpaymentsdata.cms.gov/resource/khdp-6xuy.json?%24select=%3Aid%2Cphysician_profile_id%2Cphysician_profile_last_name%2Cphysician_profile_middle_name%2Cphysician_profile_first_name%2Cphysician_profile_suffix%2Cphysician_profile_primary_specialty%2Cphysician_profile_address_line_1%2Cphysician_profile_address_line_2%2Cphysician_profile_city%2Cphysician_profile_state%2Cphysician_profile_province_name%2Cphysician_profile_country_name%2Cphysician_profile_zipcode%2Cphysician_profile_alternate_first_name1%2Cphysician_profile_alternate_last_name1%2Cphysician_profile_alternate_first_name2%2Cphysician_profile_alternate_last_name2%2Cphysician_profile_alternate_first_name3%2Cphysician_profile_alternate_last_name3%2Cphysician_profile_alternate_first_name4%2Cphysician_profile_alternate_last_name4%2Cphysician_profile_alternate_first_name5%2Cphysician_profile_alternate_last_name5%2Clocation&%24where=STARTS_WITH(UPPER(physician_profile_first_name)%2C%20%27ROBERT%27)%20AND%20STARTS_WITH(UPPER(physician_profile_last_name)%2C%20%27B%27)%20AND%20STARTS_WITH(UPPER(physician_profile_city)%2C%20%27PALO_ALTO%27)&%24order=physician_profile_last_name%20ASC%2Cphysician_profile_first_name%20ASC&%24limit=300
使用请求
print(requests.get('https://openpaymentsdata.cms.gov/resource/khdp-6xuy.json?%24select=%3Aid%2Cphysician_profile_id%2Cphysician_profile_last_name%2Cphysician_profile_middle_name%2Cphysician_profile_first_name%2Cphysician_profile_suffix%2Cphysician_profile_primary_specialty%2Cphysician_profile_address_line_1%2Cphysician_profile_address_line_2%2Cphysician_profile_city%2Cphysician_profile_state%2Cphysician_profile_province_name%2Cphysician_profile_country_name%2Cphysician_profile_zipcode%2Cphysician_profile_alternate_first_name1%2Cphysician_profile_alternate_last_name1%2Cphysician_profile_alternate_first_name2%2Cphysician_profile_alternate_last_name2%2Cphysician_profile_alternate_first_name3%2Cphysician_profile_alternate_last_name3%2Cphysician_profile_alternate_first_name4%2Cphysician_profile_alternate_last_name4%2Cphysician_profile_alternate_first_name5%2Cphysician_profile_alternate_last_name5%2Clocation&%24where=STARTS_WITH(UPPER(physician_profile_first_name)%2C%20%27ROBERT%27)%20AND%20STARTS_WITH(UPPER(physician_profile_last_name)%2C%20%27B%27)%20AND%20STARTS_WITH(UPPER(physician_profile_city)%2C%20%27PALO_ALTO%27)&%24order=physician_profile_last_name%20ASC%2Cphysician_profile_first_name%20ASC&%24limit=300').json())
输出
[{':id': 'row-9mfk-w6hd-ejup', 'physician_profile_id': '966387', 'physician_profile_last_name': 'BOCIAN', 'physician_profile_middle_name': 'C', 'physician_profile_first_name': 'ROBERT', 'physician_profile_primary_specialty': 'Allopathic & Osteopathic Physicians|Allergy & Immunology|Allergy', 'physician_profile_address_line_1': '795 EL CAMINO REAL', 'physician_profile_city': 'PALO ALTO', 'physician_profile_state': 'CA', 'physician_profile_country_name': 'UNITED STATES', 'physician_profile_zipcode': '94301-2302', 'physician_profile_alternate_first_name1': 'ROBERT', 'physician_profile_alternate_last_name1': 'BOCIAN'}]
推荐阅读
- python - 如何对数据表中的重复测量值进行排序?
- css - 如何在滑块缩略图活动滚动条之后添加元素?
- java - 具有双向一对多关系的子表中的 JHipster 外键为 Null
- php - 如何使视频具有相同的形状和高度
- python-3.x - 二进制输入的用户验证循环无法识别输入
- c# - 加载资源失败:服务器响应状态为 500(内部服务器错误)asp.net
- vb.net - 为什么单击显示时它总是替换第一个数据?
- java - 在旧版本的 android studio 上迁移到 androidx,即 3.1.2?
- class - 关于 UML 类模型中的 1 对 1 关联
- javascript - 如何在生产中禁用 console.log() 并显示横幅?