首页 > 解决方案 > 运行15分钟的小python程序

问题描述

所以我的代码按预期工作,但运行这部分代码需要 15 分钟。大部分时间都是其中的数据框部分。

我是 Python 新手,所以想知道如何使代码更高效?

我认为它是遍历数据框的 for 循环?我如何将其更改为不循环遍历每次迭代并仅对数据框项执行计算?

​</p>

import csv
import requests
import json
import openpyxl
import xml.etree.ElementTree as ET
import numpy as np
import pandas as pd
import xlrd
import math
import time

start_time = time.time()

postcode_geo = []

with open (r'Desktop\Python Projects\Work - Transport Movement\open_postcode_geo.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        postcode_geo.append(row)

#function to find lat,lon of given postcode
postcode_geo_postcodes = [x[0] for x in postcode_geo]

def FindLL(postcode):
    if postcode in postcode_geo_postcodes:
        for x in range(0,len(postcode_geo)):
            if postcode == postcode_geo[x][0]:
                latitude = postcode_geo[x][7]
                longitude = postcode_geo[x][8]
                return f'{longitude},{latitude}'  
    else:
        print("Postcode can't find longitude/latitude, please check format")

#user inputs start and end postcode
start_postcode = input("Enter start postcode here (with spaces):")
end_postcode = input("Enter end postcode here (with spaces):")

#find lat,lon of postcodes
StartLL = FindLL(start_postcode)
EndLL = FindLL(end_postcode)

print(StartLL)
print(EndLL)

#Get nodes from project osrm API Example = #http://router.project-osrm.org/route/v1/driving/-0.2507693,51.364718;-0.3795724,51.6110899?alternatives=false&annotations=nodes
route = requests.get(f'http://router.project-osrm.org/route/v1/driving/{StartLL};{EndLL}?alternatives=false&annotations=nodes')
routejson = route.json()
route_nodes = routejson['routes'][0]['legs'][0]['annotation']['nodes']

print(route_nodes)

#Turn nodes into longitude and latitude
RouteNodeLL = []

for node in route_nodes:
    response_xml = requests.get(f'https://api.openstreetmap.org/api/0.6/node/{node}')
    response_xml_as_string = response_xml.content
    responseXml = ET.fromstring(response_xml_as_string)
    for child in responseXml.iter('node'):
        RouteNodeLL.append((float(child.attrib['lat']), float(child.attrib['lon'])))


#create dataframe of current locations and add columns showing distance from nodes
df = pd.read_excel(r'Desktop\Python Projects\Work - Transport Movement\CurrentYodelTransportLocations.xlsx')

df['Latitude v2'] = df['Latitude v2'].astype(float)

for waypointLat, waypointLong in RouteNodeLL:
    for label, row in df.iterrows():
        df.loc[label,f'Distance From Node - {waypointLat}, {waypointLong}'] = (((math.acos(math.sin((row['Latitude v2']*math.pi/180)) * math.sin((waypointLat*math.pi/180))+math.cos((row['Latitude v2']*math.pi/180)) * math.cos((waypointLat*math.pi/180)) * math.cos(((row['Longitude v2'] - waypointLong )*math.pi/180))))*180/math.pi)*60*1.1515*1.609344)

print(df)

print("--- %s seconds ---" % (time.time() - start_time))

标签: pythonpandasdataframe

解决方案


这是非常低效的:

def FindLL(postcode):
    if postcode in postcode_geo_postcodes:
        for x in range(0,len(postcode_geo)):
            if postcode == postcode_geo[x][0]:
                latitude = postcode_geo[x][7]
                longitude = postcode_geo[x][8]
                return f'{longitude},{latitude}'  
    else:
        print("Postcode can't find longitude/latitude, please check format")

您正在搜索整个邮政编码列表,并且以缓慢的方式进行(在 Python 中迭代列表并检查每个项目),而不是使用list.index(). 而且您正在使用 搜索同样长的列表in,尽管这更快,因为它是一个 Python 语句。

相反,使用字典。字典使用散列,因此它们无需搜索即可直接找到您要查找的项目,这要快得多。例如:

postcode_geo = {}

# ...

with open (r'Desktop\Python Projects\Work - Transport Movement\open_postcode_geo.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        postcode_geo[row[0]] = row

# ...

def FindLL(postcode):
    if postcode in postcode_geo:
        latitude = postcode_geo[postcode][7]
        longitude = postcode_geo[postcode][8]
        return f'{longitude},{latitude}'  
    else:
        print("Postcode can't find longitude/latitude, please check format")

您可以通过将邮政编码地理地图读入数据框并将其连接到另一个数据框来更快地完成此操作,但我对 Pandas 还不够熟悉,因此无法尝试。


推荐阅读