python - 当没有更多的桌子可以拉时如何停止刮擦?
问题描述
当当前页面不再有要拉的熊猫表时,如何阻止我的 for 循环转到下一页?
现在我已经将它编码为通过某个 pageRange + 1 但不知道何时没有更多页面要解析。
谢谢!
# set variables for url to pull from
wmsLoggedIn = "https://enterurlhere/solution/entitylist.htm?" #root logged in page
pageSize = 300 # how many results per page
pageNo = 0 # starting page; starts at 0
entityName = "Shipment" # entity of the view
currentView = 51339 # view id to pull data from
pageRange = 2 # total pages returned on wms view
savefilesin = "G:\\webscraper\\Scraped Files\\"
# set variables for filename function
now = datetime.now()
randomstring = ''.join([random.choice(string.ascii_letters
+ string.digits) for n in range(6)])
def generate_filename():
return str(entityName) + "-" + str(now.strftime("%m%d%Y-%H%M%S")) + "-" + str(randomstring) + ".csv"
modified_dfs = []
for pageNo in range(pageRange + int(1)):
urlSpecifics = "entityName={entityName}&pageNo={pageNo}&pageSize={pageSize}¤tViewId={currentView}".format(entityName=entityName, pageNo=pageNo, pageSize=pageSize, currentView=currentView)
# open entity page
driver.get(wmsLoggedIn+urlSpecifics)
html = driver.page_source
for df in pd.read_html(html, attrs={"class":"roundedTable"}, header=5):
df.dropna(how="all", axis="columns", inplace=True)
df.drop({"No"}, axis="columns", inplace=True)
df.dropna(how='all', axis=0, inplace=True)
modified_dfs.append(df)
pd.concat(modified_dfs).to_csv(savefilesin + generate_filename(), index=False)
elapsed_time = time.time() - start_time # capture total time it took for the process to run
print("Process Success and saved the file as: " + generate_filename() + " It took " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)) + " to process!")
driver.quit()
解决方案
推荐阅读
- svelte - 在需要苗条/注册期间无法解析“fs”
- tensorflow - 如何使用 Bahdanau 注意力进行时间序列预测?
- sqlite - 使用 GLOB 匹配 SQLite TEXT 字段中的每个字符
- jenkins - Jenkins-节点,执行者管理
- r - 在 r 中构建三次根函数,返回负数和正数
- .net-core - 我可以知道我们是否可以将 json api 或 odata 与 ABP 3.x 集成?
- azure-ad-b2c - 从 dotnet core Razor 页面中的 Azure AD B2C SUSI 用户流捕获显示名称
- python - 如何将用户表单输入转换为整数或浮点数而不收到错误?
- python - 执行计算时从python中单个字符串中的多个单独值中去除多余的0
- python - 如何将另一个python文件中的函数调用到某个文件中