首页 > 解决方案 > For 使用 pandas 按索引循环遍历多行

问题描述

使用 train_data_sample 中的以下数据和以下代码,如何遍历每个索引纬度和经度?(请参阅下面的预期结果)

   latitude longitude price
0   55.6632 12.6288 2595000
1   55.6637 12.6291 2850000
2   55.6637 12.6291 2850000
3   55.6632 12.6290 3198000
4   55.6632 12.6290 2995000
5   55.6638 12.6294 2395000
6   55.6637 12.6291 2995000
7   55.6642 12.6285 4495000
8   55.6632 12.6285 3998000
9   55.6638 12.6294 3975000
from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
   
    for index in train_data_sample.index:
        lon1 = train_data_sample["longitude"].loc[train_data_sample.index==index]
        lat1 = train_data_sample["latitude"].loc[train_data_sample.index==index]
        lon2 = row['longitude']
        lat2 = row['latitude']
        lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
        dlon = lon2 - lon1 
        dlat = lat2 - lat1 
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * arcsin(sqrt(a)) 
        km = 6367 * c
    return km

def insert_dist(df):
    df["distance"+str(index)] = df.apply(lambda row: haversine(row), axis=1)
    return df

print(insert_dist(train_data_sample))

这是索引 0 的结果。它查看索引 0 相对于其他每一行的坐标,并返回以米为单位的距离。因此索引 0 和 1 的坐标之间的距离约为 50 米。

latitude    longitude   price   distance0
0   55.6632 12.6288 2595000    0.000000
1   55.6637 12.6291 2850000    0.058658
2   55.6637 12.6291 2850000    0.058658
3   55.6632 12.6290 3198000    0.012536
4   55.6632 12.6290 2995000    0.012536
5   55.6638 12.6294 2395000    0.076550
6   55.6637 12.6291 2995000    0.058658
7   55.6642 12.6285 4495000    0.112705
8   55.6632 12.6285 3998000    0.018804
9   55.6638 12.6294 3975000    0.076550

最终结果不仅应该返回 distance0,还应该返回 distance1、distance2 等。

标签: pythonpandas

解决方案


看起来你让事情变得比必要的复杂一些。通过在另一个 for 循环中嵌套一个 for 循环,您可以以更直接的方式实现您想要的。

from numpy import cos, sin, arcsin, sqrt
from math import radians
import pandas as pd
import numpy as np


# recreate your dataframe
data = [[55.6632, 12.6288, 2595000],
        [55.6637, 12.6291, 2850000],
        [55.6637, 12.6291, 2850000], 
        [55.6632, 12.6290, 3198000]]

data = np.array(data)

train_data_sample = pd.DataFrame(data, columns = ["latitude", "longitude", "price"])


# copied  "distance calculating" code here
def GetDistance(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * arcsin(sqrt(a)) 
    km = 6367 * c
    return km

# loop over every row with iterrows
for index, row in train_data_sample.iterrows():
    
    distances = []
    
    lat1, lon1 = row[["longitude", "longitude"]]
    
    # loop again over every row with iterrows
    for index_2, row_2 in train_data_sample.iterrows():
        lat2, lon2 = row_2[["longitude", "longitude"]]
        # get the distance
        distances.append( GetDistance(lon1, lat1, lon2, lat2) )
        
    # add the column to the dataframe    
    train_data_sample["distance"+str(index)] = distances


推荐阅读