python - 运行15分钟的小python程序
问题描述
所以我的代码按预期工作,但运行这部分代码需要 15 分钟。大部分时间都是其中的数据框部分。
我是 Python 新手,所以想知道如何使代码更高效?
我认为它是遍历数据框的 for 循环?我如何将其更改为不循环遍历每次迭代并仅对数据框项执行计算?
</p>
import csv
import requests
import json
import openpyxl
import xml.etree.ElementTree as ET
import numpy as np
import pandas as pd
import xlrd
import math
import time
start_time = time.time()
postcode_geo = []
with open (r'Desktop\Python Projects\Work - Transport Movement\open_postcode_geo.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
postcode_geo.append(row)
#function to find lat,lon of given postcode
postcode_geo_postcodes = [x[0] for x in postcode_geo]
def FindLL(postcode):
if postcode in postcode_geo_postcodes:
for x in range(0,len(postcode_geo)):
if postcode == postcode_geo[x][0]:
latitude = postcode_geo[x][7]
longitude = postcode_geo[x][8]
return f'{longitude},{latitude}'
else:
print("Postcode can't find longitude/latitude, please check format")
#user inputs start and end postcode
start_postcode = input("Enter start postcode here (with spaces):")
end_postcode = input("Enter end postcode here (with spaces):")
#find lat,lon of postcodes
StartLL = FindLL(start_postcode)
EndLL = FindLL(end_postcode)
print(StartLL)
print(EndLL)
#Get nodes from project osrm API Example = #http://router.project-osrm.org/route/v1/driving/-0.2507693,51.364718;-0.3795724,51.6110899?alternatives=false&annotations=nodes
route = requests.get(f'http://router.project-osrm.org/route/v1/driving/{StartLL};{EndLL}?alternatives=false&annotations=nodes')
routejson = route.json()
route_nodes = routejson['routes'][0]['legs'][0]['annotation']['nodes']
print(route_nodes)
#Turn nodes into longitude and latitude
RouteNodeLL = []
for node in route_nodes:
response_xml = requests.get(f'https://api.openstreetmap.org/api/0.6/node/{node}')
response_xml_as_string = response_xml.content
responseXml = ET.fromstring(response_xml_as_string)
for child in responseXml.iter('node'):
RouteNodeLL.append((float(child.attrib['lat']), float(child.attrib['lon'])))
#create dataframe of current locations and add columns showing distance from nodes
df = pd.read_excel(r'Desktop\Python Projects\Work - Transport Movement\CurrentYodelTransportLocations.xlsx')
df['Latitude v2'] = df['Latitude v2'].astype(float)
for waypointLat, waypointLong in RouteNodeLL:
for label, row in df.iterrows():
df.loc[label,f'Distance From Node - {waypointLat}, {waypointLong}'] = (((math.acos(math.sin((row['Latitude v2']*math.pi/180)) * math.sin((waypointLat*math.pi/180))+math.cos((row['Latitude v2']*math.pi/180)) * math.cos((waypointLat*math.pi/180)) * math.cos(((row['Longitude v2'] - waypointLong )*math.pi/180))))*180/math.pi)*60*1.1515*1.609344)
print(df)
print("--- %s seconds ---" % (time.time() - start_time))
解决方案
这是非常低效的:
def FindLL(postcode):
if postcode in postcode_geo_postcodes:
for x in range(0,len(postcode_geo)):
if postcode == postcode_geo[x][0]:
latitude = postcode_geo[x][7]
longitude = postcode_geo[x][8]
return f'{longitude},{latitude}'
else:
print("Postcode can't find longitude/latitude, please check format")
您正在搜索整个邮政编码列表,并且以缓慢的方式进行(在 Python 中迭代列表并检查每个项目),而不是使用list.index()
. 而且您正在使用 搜索同样长的列表in
,尽管这更快,因为它是一个 Python 语句。
相反,使用字典。字典使用散列,因此它们无需搜索即可直接找到您要查找的项目,这要快得多。例如:
postcode_geo = {}
# ...
with open (r'Desktop\Python Projects\Work - Transport Movement\open_postcode_geo.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
postcode_geo[row[0]] = row
# ...
def FindLL(postcode):
if postcode in postcode_geo:
latitude = postcode_geo[postcode][7]
longitude = postcode_geo[postcode][8]
return f'{longitude},{latitude}'
else:
print("Postcode can't find longitude/latitude, please check format")
您可以通过将邮政编码地理地图读入数据框并将其连接到另一个数据框来更快地完成此操作,但我对 Pandas 还不够熟悉,因此无法尝试。
推荐阅读
- python - keras model.fit 输出-“val_accuracy 从-inf 提高到0.29846”--inf 是什么意思?
- unit-testing - Angular 6单元测试-TypeError:无法读取null的属性'ROOT_API'
- c - 是否定义了在 C 中存储字符串的首选方式?
- reactjs - 按顺序渲染组件/在渲染组件之后(React)
- javascript - 如何删除表格行以及它在 Json 文件中链接的数据?
- python-3.x - 如何使用beautifulsoup从booking.com获取价格?
- javascript - 为什么身份和一元函数的组合不等于函数?
- python - 如何获取文件夹python django中最新修改的文件
- javascript - 根据条件渲染依赖于 useState 的功能组件
- javascript - 使用 PrimeNG 的表格过滤器