python - Fail to load a subpart of "open-images-v6" with Fiftyone
问题描述
Context
I'm trying to retrieve a large amount of data to train a CNN. More specifically, I'm looking for pictures of Swimming pools. I have found a lot of them in the open-images-v6 database made by Google. So now, I just want to download these particular images (I don't want 9 Millions images to end up in my download folder).
Problem
In order to do this, I followed carefully the instructions given on the Download page (see : https://storage.googleapis.com/openimages/web/download.html). So, I installed "fiftyone", tried out the "testing" procedure (which would be loading the "quickstart" dataset and navigating through the data) and have not encountered any issues so far.
But when I tried to retrieve the Swimming pool images with the following code, I went through a lot of issues :
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types="detections",
classes="Swimming pool"
)
session = fo.launch_app(dataset)
I will skip right to the problem I couldn't figure out : when I run the code, it properly downloads a bunch of .csv files, but when it tries to download the data (the images) it shows a pretty bad looking error :
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
State of the art
After hours of searching the origin of the error, I eventually discovered that it was somehow linked with AWS, but I have absolutely no clue what I can do on this field.
I saw a random tutorial on internet that recommended to install "awscli" via PIP but nothing changed.
I tried to import other datasets with the same procedure (i.e foz.load_zoo_dataset("coco-2017")
) and it seemed to work (at least the download started but I stopped it early).
Thank you for your time.
解决方案
Thank you for the aws hint, that finally got me on the right trail.
Fiftyone uses the python os.path.join() functionality, which will create windows style paths when running windows. The s3 blob storage can't use those windows paths, therefore raising the 404 error.
Since this is a bug in fiftyone itself (I will create a pr to get that bug fixed), you will need to modify fiftyone yourself.
Go to your python site-packages dir, then open fityone/utils/openimages.py
In this file, add the following code to the import statements:
import re
Then search for the _download_images_if_necessary method and replace this line:
fp_download = os.path.join(split, image_id + ".jpg")
with this one:
fp_download = re.sub(r"\\", "/", os.path.join(split, image_id + ".jpg"))
This did fix the problem for me.
推荐阅读
- php - Laravel 6 语言翻译问题,错误信息无法正确翻译
- java - Couchbase BucketClosedException,而桶正在被积极使用
- python - 在 try/except 语句中的异常期间继续
- pandas - 使用熊猫按绝对值对列进行排序
- javascript - Angular 9 如何从角度数组对象的所有数组中删除字段?
- ruby-on-rails - 切换 ruby 版本时始终找不到捆绑程序
- java - Resilience4J 断路器启动特定的 HTTP 状态代码
- vue.js - 当我尝试确认来自烧瓶的电子邮件然后重定向到 vue 路由时出错,我错过了什么或者我可以使用什么方法?
- reactjs - 获得 React componentWillMount 警告甚至很难我没有使用它
- excel - vb.net 数组中元素的最大大小