python - 抓取从 Mapbox 获取的经纬度位置
问题描述
我正在开发一个 divvy 数据集项目。
我想从这里http://suggest.divvybikes.com/抓取每个建议位置和评论的信息。
我可以从 Mapbox 中抓取这些信息吗?它显示在地图上,因此它必须在某处具有信息。
解决方案
我访问了该页面,并使用 Google Chrome 的开发者工具记录了我的网络流量。过滤请求以仅查看 XHR (XmlHttpRequest) 请求,我看到了大量对各种 REST API 的 HTTP GET 请求。这些 REST API 返回 JSON,这是理想的。这些 API 中只有两个似乎与您的目的相关 - 一个用于places
,另一个用于comments
与这些地方相关联。API的places
JSON 包含有趣的信息,例如地点 ID 和坐标。API 的 JSON 包含有关特定地点的comments
所有评论,由其 id 标识。使用第三方requests
模块模仿这些调用非常简单。幸运的是,API 似乎并不关心请求标头。查询字符串参数(params
字典)当然需要精心制定。
我能够想出以下两个函数:get_places
对同一个 API 进行多次调用,每次都使用不同的page
查询字符串参数。似乎“页面”是他们在内部使用的术语,用于将所有数据拆分为不同的块——所有不同的位置/功能/站点都拆分为多个页面,每次 API 调用只能获得一个页面。while 循环将所有位置累积在一个巨大的列表中,它会一直运行,直到我们收到一个告诉我们没有更多页面的响应。循环结束后,我们返回地点列表。
另一个函数是get_comments
,它将一个位置 id(字符串)作为参数。然后它向适当的 API 发出 HTTP GET 请求,并返回该位置的评论列表。如果没有评论,此列表可能为空。
def get_places():
import requests
from itertools import count
api_url = "http://suggest.divvybikes.com/api/places"
page_counter = count(1)
places = []
for page_nr in page_counter:
params = {
"page": str(page_nr),
"include_submissions": "true"
}
response = requests.get(api_url, params=params)
response.raise_for_status()
content = response.json()
places.extend(content["features"])
if content["metadata"]["next"] is None:
break
return places
def get_comments(place_id):
import requests
api_url = "http://suggest.divvybikes.com/api/places/{}/comments".format(place_id)
response = requests.get(api_url)
response.raise_for_status()
return response.json()["results"]
def main():
from operator import itemgetter
places = get_places()
place_id = places[12]["id"]
print("Printing comments for the thirteenth place (id: {})\n".format(place_id))
for comment in map(itemgetter("comment"), get_comments(place_id)):
print(comment)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
输出:
Printing comments for the thirteenth place (id: 107062)
I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.
>>>
对于此示例,我将打印位置列表中第 13 位的所有评论。我选择那个是因为它是第一个真正有评论的地方(0 - 11 没有任何评论,大多数地方似乎没有评论)。在这种情况下,这个地方只有一条评论。
编辑 - 如果您想将地点 ID、纬度、经度和评论保存在 CSV 中,您可以尝试将main
函数更改为:
def main():
import csv
print("Getting places...")
places = get_places()
print("Got all places.")
fieldnames = ["place id", "latitude", "longitude", "comments"]
print("Writing to CSV file...")
with open("output.csv", "w") as file:
writer = csv.DictWriter(file, fieldnames)
writer.writeheader()
num_places_to_write = 25
for place_nr, place in enumerate(places[:num_places_to_write], start=1):
print("Writing place #{}/{} with id {}".format(place_nr, num_places_to_write, place["id"]))
writer.writerow(dict(zip(fieldnames, [place["id"], *place["geometry"]["coordinates"], [c["comment"] for c in get_comments(place["id"])]])))
return 0
有了这个,我得到了如下结果:
place id,latitude,longitude,comments
107098,-87.6711076553,41.9718155716,[]
107097,-87.759540081,42.0121073671,[]
107096,-87.747695446,42.0263916146,[]
107090,-87.6642036438,42.0162096564,[]
107089,-87.6609444613,41.8852953922,[]
107083,-87.6007853815,41.8199433342,[]
107082,-87.6355862613,41.8532736671,[]
107075,-87.6210737228,41.8862644836,[]
107074,-87.6210737228,41.8862644836,[]
107073,-87.6210737228,41.8862644836,[]
107065,-87.6499611139,41.9627251578,[]
107064,-87.6136027649,41.8332984674,[]
107062,-87.7073025402,42.0760990584,"[""I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.""]"
在这种情况下,我使用列表切片语法 ( places[:num_places_to_write]
) 仅将前 25 个位置写入 CSV 文件,仅用于演示目的。然而,在写完前十三个之后,我收到了这个异常消息:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
所以,我猜comment-API 不希望在这么短的时间内收到这么多请求。您可能需要在循环中睡一会儿才能解决这个问题。API也可能不在乎,只是碰巧超时。
推荐阅读
- batch-file - 如何将批处理文件输出输出到文件和控制台?
- c# - 如何按包含过滤项目
- javascript - 获取所有导航器对象并使用 Ajax 调用保存到文件
- javascript - replaceChild() 在 javascript 中移动仅运行一次的图像
- ios - 动态高度 UITableViewCell 内的动态高度 UICollectionView
- c# - .NET Core 高内存使用 base64 编码来自数据库的消息
- haskell - 接受一个字符数组并返回一个连接字符串的函数。哈斯克尔
- python - 使用 sigmoid 函数有什么好处?
- swift - UINavigationController 标题未与 UIBarButton 对齐
- performance - 计算阶乘的对数