首页 > 解决方案 > 循环遍历数据框并为每个用户组填充 url 请求

问题描述

我有一个带有 GPS 点的 pandas 数据框,如下所示:

    import pandas as pd
    d = {'user': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 'lat': [ 37.75243634842733, 37.75344580658182, 37.75405656449232, 37.753649393112181,37.75409897804892, 37.753937806404586, 37.72767062183685, 37.72710631810977, 37.72605407110467, 37.71141865080228, 37.712199505873926, 37.713285899241896, 37.71428740401767, 37.712810604103346], 'lon': [-122.41924881935118, -122.42006421089171, -122.419216632843, -122.41784334182738, -122.4169099330902, -122.41549372673035, -122.3878937959671, -122.3884356021881, -122.38841414451599, -122.44688630104064, -122.44474053382874, -122.44361400604248, -122.44260549545288, -122.44156479835509]}
    df = pd.DataFrame(data=d)
    

    user    lat         lon
0   A       37.752436   -122.419249
1   A       37.753446   -122.420064
2   A       37.754057   -122.419217
3   A       37.753649   -122.417843
4   A       37.754099   -122.416910
5   A       37.753938   -122.415494
6   B       37.727671   -122.387894
7   B       37.727106   -122.388436
8   B       37.726054   -122.388414
9   C       37.711419   -122.446886
10  C       37.712200   -122.444741
11  C       37.713286   -122.443614
12  C       37.714287   -122.442605
13  C       37.712811   -122.441565

使用下面的函数,我可以将所有这些坐标从 df 直接提供给(OSRM)请求以匹配这些 GPS 点

import numpy as np
from typing import Dict, Any, List, Tuple
import requests
# Format NumPy array of (lat, lon) coordinates into a concatenated string formatted for OSRM server
def format_coords(coords: np.ndarray) -> str:
    coords = ";".join([f"{lon:f},{lat:f}" for lat, lon in coords])
    return coords

# Forward request to the OSRM server and return a dictionary of the JSON response.
def make_request(
        coords: np.ndarray,
    ) -> Dict[str, Any]:
    coords = format_coords(coords)
    url = f"http://router.project-osrm.org/match/v1/car/{coords}"
    r = requests.get(url)
    return r.json()

coords=df[['lat','lon']].values    

# Make request against the OSRM HTTP server
output = make_request(coords)

但是,由于 df 由不同用户生成的不同 GPS 轨迹组成,我想编写一个函数,循环遍历该数据帧,并将相应的坐标集提供给每个用户组的请求,而不是一次全部提供。做这个的最好方式是什么?

标签: pythonpandasloopspython-requests

解决方案


您可以列groupby上的数据框user,然后应用于make_request每个组,并将输出保存到outputdict(以用户为键):

output = {}
for user, g in df.groupby('user'):
    output[user] = make_request(g[['lat', 'lon']].values)

推荐阅读