python - 将物理地址转换为地理位置纬度和经度
问题描述
我已经阅读了一个 CSV 文件(包含客户地址)并将数据分配到 DataFrame 表中。
csv 文件(或 DataFrame 表)的描述
DataFrame 包含几行和 5 列
数据库示例
Address1 Address3 Post_Code City_Name Full_Address
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
我编写了一个代码(Geocode with Python),以便将物理地址转换为地理位置→纬度和经度,但代码一直显示几个错误
到目前为止,我已经编写了这段代码:
代码是
import pandas as pd
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
# Read the CSV, by the way the csv file contains 43 columns
ERP_Data = pd.read_csv("test.csv")
# Extracting the address information into a new DataFrame
Address_info= ERP_Data[['Address1','Address3','Post_Code','City_Name']].copy()
# Adding a new column called (Full_Address) that concatenate address columns into one
# for example Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden
Address_info['Full_Address'] = Address_info[Address_info.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)), axis=1)
locator = Nominatim(user_agent="myGeocoder") # holds the Geocoding service, Nominatim
# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
# 2- create location column
Address_info['location'] = Address_info['Full_Address'].apply(geocode)
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
Address_info['point'] = Address_info['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
Address_info[['latitude', 'longitude', 'altitude']] = pd.DataFrame(Address_info['point'].tolist(), index=Address_info.index)
# using Folium to map out the points we created
folium_map = folium.Map(location=[49.61167,6.13], zoom_start=12,)
完整输出错误的示例是:
RateLimiter caught an error, retrying (0/2 tries). Called with (*('44 AVENUE JOHN FITZGERALD KENNEDY,L-1855,LUXEMBOURG',), **{}).
Traceback (most recent call last):
File "e:\Anaconda3\lib\urllib\request.py", line 1317, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "e:\Anaconda3\lib\http\client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "e:\Anaconda3\lib\http\client.py", line 966, in send
self.connect()
File "e:\Anaconda3\lib\http\client.py", line 1414, in connect
server_hostname=server_hostname)
File "e:\Anaconda3\lib\ssl.py", line 423, in wrap_socket
session=session
File "e:\Anaconda3\lib\ssl.py", line 870, in _create
self.do_handshake()
File "e:\Anaconda3\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1059: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\base.py", line 355, in _call_geocoder
page = requester(req, timeout=timeout, **kwargs)
File "e:\Anaconda3\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "e:\Anaconda3\lib\urllib\request.py", line 543, in _open
'_open', req)
File "e:\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "e:\Anaconda3\lib\urllib\request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "e:\Anaconda3\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error _ssl.c:1059: The handshake operation timed out>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\Anaconda3\lib\site-packages\geopy\extra\rate_limiter.py", line 126, in __call__
return self.func(*args, **kwargs)
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\osm.py", line 387, in geocode
self._call_geocoder(url, timeout=timeout), exactly_one
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\base.py", line 378, in _call_geocoder
raise GeocoderTimedOut('Service timed out')
geopy.exc.GeocoderTimedOut: Service timed out
预期输出为
Address1 Address3 Post_Code City_Name Full_Address Latitude Longitude
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG 49.6302147 6.1713374
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ 49.5217917 6.0101385
解决方案
我已经更新了你的代码:
- 添加:
Address_info = Address_info.apply(lambda x: x.str.strip(), axis=1)
- 删除前后的空格
str
- 删除前后的空格
- 添加了一个带有
try-except
, 的函数来处理查找
from geopy.exc import GeocoderTimedOut, GeocoderQuotaExceeded
import time
ERP_Data = pd.read_csv("test.csv")
# Extracting the address information into a new DataFrame
Address_info= ERP_Data[['Address1','Address3','Post_Code','City_Name']].copy()
# Clean existing whitespace from the ends of the strings
Address_info = Address_info.apply(lambda x: x.str.strip(), axis=1) # ← added
# Adding a new column called (Full_Address) that concatenate address columns into one
# for example Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden
Address_info['Full_Address'] = Address_info[Address_info.columns[1:]].apply(lambda x: ','.join(x.dropna().astype(str)), axis=1)
locator = Nominatim(user_agent="myGeocoder") # holds the Geocoding service, Nominatim
# 1 - convenient function to delay between geocoding calls
# geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
def geocode_me(location):
time.sleep(1.1)
try:
return locator.geocode(location)
except (GeocoderTimedOut, GeocoderQuotaExceeded) as e:
if GeocoderQuotaExceeded:
print(e)
else:
print(f'Location not found: {e}')
return None
# 2- create location column
Address_info['location'] = Address_info['Full_Address'].apply(lambda x: geocode_me(x)) # ← note the change here
# 3 - create longitude, latitude and altitude from location column (returns tuple)
Address_info['point'] = Address_info['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
Address_info[['latitude', 'longitude', 'altitude']] = pd.DataFrame(Address_info['point'].tolist(), index=Address_info.index)
输出:
Address1 Address3 Post_Code City_Name Full_Address location point latitude longitude altitude
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG (Rue Edward Steichen, Grünewald, Weimershof, Neudorf-Weimershof, Luxembourg, Canton Luxembourg, 2540, Lëtzebuerg, (49.6302147, 6.1713374)) (49.6302147, 6.1713374, 0.0) 49.630215 6.171337 0.0
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898,FOETZ (Rue du Brill, Mondercange, Canton Esch-sur-Alzette, 3898, Luxembourg, (49.5217917, 6.0101385)) (49.5217917, 6.0101385, 0.0) 49.521792 6.010139 0.0
10000052 3 RUE DU PUITS ROMAIN L-8070 BERTRANGE 3 RUE DU PUITS ROMAIN,L-8070,BERTRANGE (Rue du Puits Romain, Z.A. Bourmicht, Bertrange, Canton Luxembourg, 8070, Lëtzebuerg, (49.6084531, 6.0771901)) (49.6084531, 6.0771901, 0.0) 49.608453 6.077190 0.0
注意和其他资源:
- 输出包括导致 TraceBack 错误的地址
RateLimiter caught an error, retrying (0/2 tries). Called with (*('3 RUE DU PUITS ROMAIN ,L-8070 ,BERTRANGE ',)
- 请注意地址中所有额外的空格。我添加了一行代码来删除字符串开头和结尾的空格
- GeocoderTimedOut,真的很痛苦吗?
- Geopy:捕获超时错误
最后:
- 最终的结果是因为
HTTP Error 429: Too Many Requests
当天的服务超时。 - 查看Nominatim 使用政策
- 建议:使用不同的地理编码器
推荐阅读
- java - 使用 Apache POI 从 Excel 中列出所有定义的名称
- javascript - SCORM托管跨域
- apache - Modpagespeed Apache 插件不延迟加载图像
- java - stbi_failure_reason() 中的错误 1281 导致 PNG 图像不显示
- django - 干净地覆盖 pip 包以进行开发
- r - 如何根据 R 中 lapply 的结果根据 AIC 从最佳到最差对模型进行排序
- firebase - 使用来自 url 的图标反应本地本地通知
- google-cloud-data-fusion - 如果我使用 BigQuery 插件,BigQuery 视图无法正常工作
- c# - c#:如何删除未由事务提交的新插入记录
- java - 为什么 do-while 不响应 .equals