首页 > 解决方案 > 两个多边形中的检查点(代码改进)

问题描述

问题:我想改进我的代码,让ID汽车在给定的 2 个城市区域之间启动结束。

我有:

  1. `.csv 文件包含这样的城市区域:

    borders =
    
    zone longitude latitude multi
    12   3.5248    22.0952  MULTIPOLYGON(((3.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,3.4992922409 22.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
    14   3.5139    22.111   MULTIPOLYGON(((12.4991688909 22.1096707778,3.4992650150 22.1094740107, ... ,32.4992922409 22.1094203597,3.4995744041 32.1087939694,3.4997139945 22.1081206986)))
    ...
    800  3.5273    22.1019  MULTIPOLYGON(((4.4991688909 15.1096707778,3.4992650150 22.1094740107, ... ,4.4992922409 75.1094203597,3.4995744041 22.1087939694,3.4997139945 22.1081206986)))
    

因此,我想检查我的

  1. 包含出租车数据的 .csv 文件:

    data = 
    
    ID      latitude longitude epoch        day_of_week
    
    e35f6   11.9125  3.7432    8765456787    Sunday
    e35f6   11.9125  3.7432    4567876545    Sunday
    ...
    fhg3g   23.9125  5.7432    2345434554    Sunday
    

因此,我想检查我的车是否在12ID开始旅行并在14 结束(但我想检查每个区域)zonezone

到目前为止我做了什么:

然后

但这是一个非常耗时的过程。寻找改进。这是我的代码:

df_first = df.drop_duplicates(subset=['id_easy'], keep='first') # removed duplicates
df_last = df.drop_duplicates(subset=['id_easy'], keep='last') # removed duplicates

crs = {'init':'epsg:4326'}
geometry_first = [Point(xy) for xy in zip(df_first.longitude,df_first.latitude)]
df_first = gpd.GeoDataFrame(df_first,crs=crs,geometry=geometry_first)

geometry_last = [Point(xy) for xy in zip(df_last.longitude,df_last.latitude)]
df_last = gpd.GeoDataFrame(df_last,crs=crs,geometry=geometry_last)

border_1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone1.csv")

geometry_1 = [Point(xy) for xy in zip(border_1.longitude,border_1.latitude)]
border_1 = gpd.GeoDataFrame(border_1,crs=crs,geometry=geometry_1)

border_2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/zone2.csv")

geometry_2 = [Point(xy) for xy in zip(border_2.longitude,border_2.latitude)]
border_2 = gpd.GeoDataFrame(border_2,crs=crs,geometry=geometry_2)

turin_final_1 = Polygon([[p.x, p.y] for p in border_1.geometry])
first = df_first[df_first.geometry.within(turin_final_1)]

turin_final_2 = Polygon([[p.x, p.y] for p in border_2.geometry])
last = df_last[df_last.geometry.within(turin_final_2)]
first.epoch = pd.to_datetime(first.epoch,unit = 's')

first.index = pd.to_datetime(first.epoch)
last.index = pd.to_datetime(last.epoch)

first1 = first.between_time('0:00', '1:00')
last1 = last.between_time('0:00', '1:00') #till to 24

first1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df1\Saturday1_first1.csv',index=False)
last1.to_csv(r'D:\anaconda path\PTV\1) Data preparation\between zones\df2\Saturday1_last1.csv',index=False) #till to 24

os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df1")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

df1 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df1.to_csv( "df1.csv", index=False, encoding='utf-8-sig')

os.chdir("D:/anaconda path/PTV/1) Data preparation/between zones/df2")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

df2 = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
df2.to_csv( "df2.csv", index=False, encoding='utf-8-sig')

df1 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df1/df1.csv")
df2 = pd.read_csv("D:/anaconda path/PTV/1) Data preparation/between zones/df2/df2.csv")

df3 = (pd.concat((df1[df1.id_easy.isin(df2.id_easy)],
            df2[df2.id_easy.isin(df1.id_easy)]),
           ignore_index=True)
    .sort_values('id_easy'))

标签: pythonpandasgeolocationgeopandas

解决方案


如果我理解正确,您想将区域编号分配给每辆车的起点和终点。由于您在边界数据中具有有关城市区域(多列)形状的多边形信息,因此我建议您使用此信息在起点/终点上进行空间连接,以查看汽车在哪个区域开始/结束旅程。这是我更详细的建议:

我假设您已经读取了 and 中的数据,border并且data它的格式正确(即multi列中的每个单元格都border包含 a Shapley.Multipolygon)。我还假设您的区域是不相交的,即没有重叠。

AGeoDataFrame需要一个几何列,因为它只能将具有此名称的列识别为几何信息:

border['geometry'] = border['multi']

现在我们还为汽车数据中给出的点生成几何信息df

df['geometry'] = df[['longitude', 'latitude']].apply(lambda x: Point(x[0], x[1]), axis=1)

正如您所做的那样,现在让我们提取起点和终点:

df_first = df.drop_duplicates(subset=['id_easy'], keep='first')
df_last = df.drop_duplicates(subset=['id_easy'], keep='last')

现在我们可以根据需要进行空间连接以获取每个起点和终点的区域:

df_first = gpd.sjoin(df_first, shp.loc[:, ['geometry', 'zone']], how='left', op='within')
df_last = gpd.sjoin(df_last, shp.loc[:, ['geometry', 'zone']], how='left', op='within')

而已。现在,您在zone列中获得了每个点的区域信息。


推荐阅读