python-3.x - ImageDataBunch.from_df 位置索引器超出范围
问题描述
在这个问题上摸不着头脑。我不知道如何识别位置索引器。我什至会通过他们吗?
为我的第一个 kaggle comp 尝试此操作,可以将 csv 传递到数据框并进行所需的编辑。尝试创建 ImageDataBunch 以便可以开始训练 cnn。无论尝试哪种方法都会弹出此错误。任何意见,将不胜感激。
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes
回溯
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-25-5588812820e8> in <module>
----> 1 data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
2 data.classes
/opt/conda/lib/python3.7/site-packages/fastai/vision/data.py in from_df(cls, path, df, folder, label_delim, valid_pct, seed, fn_col, label_col, suffix, **kwargs)
117 src = (ImageList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
118 .split_by_rand_pct(valid_pct, seed)
--> 119 .label_from_df(label_delim=label_delim, cols=label_col))
120 return cls.create_from_ll(src, **kwargs)
121
/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
477 assert isinstance(fv, Callable)
478 def _inner(*args, **kwargs):
--> 479 self.train = ft(*args, from_item_lists=True, **kwargs)
480 assert isinstance(self.train, LabelList)
481 kwargs['label_cls'] = self.train.y.__class__
/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
283 def label_from_df(self, cols:IntsOrStrs=1, label_cls:Callable=None, **kwargs):
284 "Label `self.items` from the values in `cols` in `self.inner_df`."
--> 285 labels = self.inner_df.iloc[:,df_names_to_idx(cols, self.inner_df)]
286 assert labels.isna().sum().sum() == 0, f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
287 if is_listy(cols) and len(cols) > 1 and (label_cls is None or label_cls == MultiCategoryList):
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1760 except (KeyError, IndexError, AttributeError):
1761 pass
-> 1762 return self._getitem_tuple(key)
1763 else:
1764 # we by definition only have the 0th axis
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
2065 def _getitem_tuple(self, tup: Tuple):
2066
-> 2067 self._has_valid_tuple(tup)
2068 try:
2069 return self._getitem_lowerdim(tup)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
701 raise IndexingError("Too many indexers")
702 try:
--> 703 self._validate_key(k, i)
704 except ValueError:
705 raise ValueError(
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
2007 # check that the key does not exceed the maximum size of the index
2008 if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 2009 raise IndexError("positional indexers are out-of-bounds")
2010 else:
2011 raise ValueError(f"Can only index by location with a [{self._valid_types}]")
IndexError: positional indexers are out-of-bounds
解决方案
当我的数据框/CSV 没有明确定义类标签时,我在创建 DataBunch 时遇到了这个错误。
我创建了一个虚拟列,它为数据框中的所有行存储了 1,它似乎可以工作。另外请务必将您的自变量存储在第二列中,并将标签(在这种情况下为虚拟变量)存储在第一列中。
我相信如果 Pandas DataFrame 中只有一列,就会发生此错误。
谢谢。
代码:
df = pd.DataFrame(lines, columns=["dummy_value", "text"])
df.to_csv("./train.csv")
data_lm = TextLMDataBunch.from_csv(path, "train.csv", min_freq=1)
注意:这是我第一次尝试回答 StackOverflow 问题。希望它有所帮助!
推荐阅读
- mysql - 无法访问在 vpc 的子网 cidr 范围为 100.0.0.0/26 的私有子网中创建的 mysql
- swift - CreateML 对“testing.csv”的 URL 的预期目录是什么意思?
- java - 如何从 Firestore DB 文档中的地图字段的子字段中检索值?
- python - 带有for循环和if-else语句的python列表
- java - 如何使用另一个java文件中的对话框?
- html - 我怎样才能得到这个代码来显示 fontawesome 图标?
- java - 覆盖java库中的属性
- http - Go 中的 Lambda 二进制有效负载编码
- sql - 日期地点条款
- python - ee.ModelfromAIPlatform 用于非 TensorFlow 模型