python - 循环遍历动态表 - python
问题描述
请帮忙。我已经为此工作了好几天,但我无法弄清楚我在哪里弄错了。我试图遍历一个表,但我只得到第一行,没有别的。我究竟做错了什么?我猜我的循环可能是罪魁祸首,但我还是 python 新手,无法弄清楚。我想在一个 excel 文档中结束所有内容
from numpy import fabs
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
driver = webdriver.Chrome(r"C:\Users\noree\OneDrive\Documents\chromedriver.exe")
driver.get('https://www.depositaccounts.com/banks/assets.aspx?instType=&stateType=hq&state=')
driver.maximize_window()
#get url largest banks and credit unions by assets
#Show all entries - xpath for show all button
show_all_button = driver.find_element(By.XPATH,'//*[@id="results"]/div/a')
# Click 'Show all' Button
show_all_button.click()
#scrape the tables
rank = driver.find_elements(By.XPATH, '//*[@id="assetsTable"]/tbody/tr[2]/td[1]')
financial_institution = driver.find_elements(By.XPATH,'//table[@id="assetsTable"]/tbody/tr[2]/td[2]/a')
headquarters = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[3]')
assets = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[4]')
asset_growth = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[5]')
branches = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[6]')
states_with_branches = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[7]')
employees = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[7]')
customer_accounts = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[8]')
#create empty list
bank_results = []
for i in range(len(rank)):
temporary_data={
'Rank': rank[i].text,
'Financial Institution': financial_institution[i].text,
'Headquarters': headquarters[i].text,
'Assets': assets[i].text,
'Asset Growth': asset_growth[i].text,
'Branches': branches[i].text,
'States with Branches': states_with_branches[i].text,
'Employees': employees[i].text,
'Customer Accounts': customer_accounts[i].text
}
bank_results.append(temporary_data)
df_data = pd.DataFrame(bank_results)
df_data
解决方案
您的错误来自您只选择一条记录的事实
如果您想要一个最接近您的解决方案:
from selenium.webdriver.support import expected_conditions as EC
:
:
url='https://www.depositaccounts.com/banks/assets.aspx?instType=&stateType=hq&state='
driver.get(url)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="results"]/div/a'))).click()
time.sleep(3)
records = driver.find_elements(By.XPATH, "//table[@id='assetsTable']//tr[not(./th)]")
nbr_records = len(records)
bank_results = []
for i in range(nbr_records):
temporary_data={
'Rank': records[i].find_element(By.XPATH, "./td[1]").text,
'Financial Institution': records[i].find_element(By.XPATH, "./td[2]").text,
'Headquarters': records[i].find_element(By.XPATH, "./td[3]").text,
'Assets': records[i].find_element(By.XPATH, "./td[4]").text,
'Asset Growth': records[i].find_element(By.XPATH, "./td[5]").text,
'Branches': records[i].find_element(By.XPATH, "./td[6]").text,
'States with Branches': records[i].find_element(By.XPATH, "./td[7]").text,
'Employees': records[i].find_element(By.XPATH, "./td[8]").text,
'Customer Accounts': records[i].find_element(By.XPATH, "./td[9]").text
}
bank_results.append(temporary_data)
xpath = "//table[@id='assetsTable']//tr[not(./th)]"
方法
选择所有不包含标签的trs并且有一个id = assetsTable的父表
推荐阅读
- sql - 其他表有外键时如何删除表中的行
- apache-kafka - 为什么kafka不创建主题?bootstrap-server 不是公认的选项
- apache-kafka - io.confluent.ksql.exception.KafkaTopicExistsException: when launching ksql-server-start ksql-server.properties
- azure - 在 foreach-object 循环中拆分字符串
- node.js - Openssl 命令在 AWS nodejs 中的文件名前自动添加“/openssl”,导致找不到文件错误
- python - Google ML Engine 上的批量预测
- javascript - hyperdb - 复制没有错误,但文档没有传播
- groovy - 有没有办法通过 Jenkins 脚本控制台使用 groovy 脚本清理 Jenkins Worflowjob 工作区?
- c# - 从Controller获取JSON数据到View中的Dropdown
- r - 我无法更改 facet_grid 中的标签名称