python - 使用 python 3 和 Selenium 抓取动态生成的表
问题描述
我是 Python 新手,正在尝试抓取动态生成的表。我已经够远了,可以打开页面,输入搜索,然后显示结果表。我在抓取结果时遇到了麻烦,我注意到结果的特定文本不是 HTML 的一部分。到目前为止,这是我的代码,感谢您的帮助。
## module importation
import os, requests, bs4, openpyxl, webbrowser, lxml, html5lib, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
print('Type in the FIRST NAME of the individual.')
#I've been using [Mike] here.
firstName = input()
print('Thanks. Now type in the individual\'s LAST NAME.')
#I've been using [Jones] here.
lastName = input()
browser = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')
#BoP inmate locator
#Goes to BoP website
browser.get('https://www.bop.gov/inmateloc/')
res = requests.get('https://www.bop.gov/inmateloc/')
#Clicks Search by name option (just in case)
searchByNameButton = browser.find_element_by_css_selector("#ui-id-1")
searchByNameButton.click() # clicks the Search by Name Button
#enters first name
bopSearchFirstNameElem =
browser.find_element_by_css_selector('#inmNameFirst')
bopSearchFirstNameElem.send_keys(firstName)
#enters last name
bopSearchLastNameElem =
browser.find_element_by_css_selector('#inmNameLast')
bopSearchLastNameElem.send_keys(lastName)
# Clicks search
searchSubmitButton =
browser.find_element_by_css_selector('#searchNameButton')
searchSubmitButton.click() # clicks the Search Button on the BoP page
# Scrape table results
bopResultsPage = bs4.BeautifulSoup(res.text, 'html.parser')
解决方案
这将完美地工作:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
firstName = input('Insert your first name: ')
lastName = input('Insert your last name: ')
browser = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')
browser.get('https://www.bop.gov/inmateloc/')
browser.implicitly_wait(2)
browser.find_element_by_css_selector("#ui-id-1").click()
browser.find_element_by_css_selector('#inmNameFirst').send_keys(firstName)
browser.find_element_by_css_selector('#inmNameLast').send_keys(lastName)
browser.find_element_by_css_selector('#searchNameButton').click()
WebDriverWait(browser, 5).until(expected_conditions.text_to_be_present_in_element((By.XPATH, '//*[@id="nameBriefTd"]'), 'Results for search'))
for row in browser.find_elements_by_xpath('//*[@id="inmateTable"]/tbody/tr'):
for cell in row.find_elements_by_xpath('td'):
print(cell.text)
print()
browser.close()
推荐阅读
- database - mongo 聚合 $lookup 与数组
- docker - 将 nginx 添加到我的 dockerized Django 应用程序
- java - 将 Avro GenericRecord 转换为 SpecificData 对象,同时将 Long 转换为 Instant
- c# - 将继承类中的对象绑定到组合框c#
- java - ClassCastException:无法转换为类,并且位于加载程序“app”的未命名模块中
- javascript - compondentDidMount 参数未定义,无法执行
- javascript - 如何从 LocalStorage 中删除数组项并在 JavaScript 中更新 UI?
- android - android studio:我的应用程序在启动后立即崩溃
- flutter - 向 Flutter App 添加大量图片
- python - 我需要通过 macOS 应用程序中的 NSTask 执行 python 脚本,但如果应用程序被沙盒化,则脚本无法找到模块。有什么建议吗?