首页 > 解决方案 > ImageDataBunch.from_df 位置索引器超出范围

问题描述

在这个问题上摸不着头脑。我不知道如何识别位置索引器。我什至会通过他们吗?

为我的第一个 kaggle comp 尝试此操作,可以将 csv 传递到数据框并进行所需的编辑。尝试创建 ImageDataBunch 以便可以开始训练 cnn。无论尝试哪种方法都会弹出此错误。任何意见,将不胜感激。

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes

回溯

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-25-5588812820e8> in <module>
----> 1 data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
      2 data.classes

/opt/conda/lib/python3.7/site-packages/fastai/vision/data.py in from_df(cls, path, df, folder, label_delim, valid_pct, seed, fn_col, label_col, suffix, **kwargs)
    117         src = (ImageList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
    118                 .split_by_rand_pct(valid_pct, seed)
--> 119                 .label_from_df(label_delim=label_delim, cols=label_col))
    120         return cls.create_from_ll(src, **kwargs)
    121 

/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
    477         assert isinstance(fv, Callable)
    478         def _inner(*args, **kwargs):
--> 479             self.train = ft(*args, from_item_lists=True, **kwargs)
    480             assert isinstance(self.train, LabelList)
    481             kwargs['label_cls'] = self.train.y.__class__

/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
    283     def label_from_df(self, cols:IntsOrStrs=1, label_cls:Callable=None, **kwargs):
    284         "Label `self.items` from the values in `cols` in `self.inner_df`."
--> 285         labels = self.inner_df.iloc[:,df_names_to_idx(cols, self.inner_df)]
    286         assert labels.isna().sum().sum() == 0, f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
    287         if is_listy(cols) and len(cols) > 1 and (label_cls is None or label_cls == MultiCategoryList):

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   2065     def _getitem_tuple(self, tup: Tuple):
   2066 
-> 2067         self._has_valid_tuple(tup)
   2068         try:
   2069             return self._getitem_lowerdim(tup)

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    701                 raise IndexingError("Too many indexers")
    702             try:
--> 703                 self._validate_key(k, i)
    704             except ValueError:
    705                 raise ValueError(

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   2007             # check that the key does not exceed the maximum size of the index
   2008             if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 2009                 raise IndexError("positional indexers are out-of-bounds")
   2010         else:
   2011             raise ValueError(f"Can only index by location with a [{self._valid_types}]")

IndexError: positional indexers are out-of-bounds

标签: python-3.xkagglefast-ai

解决方案


当我的数据框/CSV 没有明确定义类标签时,我在创建 DataBunch 时遇到了这个错误。

我创建了一个虚拟列,它为数据框中的所有行存储了 1,它似乎可以工作。另外请务必将您的自变量存储在第二列中,并将标签(在这种情况下为虚拟变量)存储在第一列中。

我相信如果 Pandas DataFrame 中只有一列,就会发生此错误。

谢谢。

代码:

df = pd.DataFrame(lines, columns=["dummy_value", "text"])
df.to_csv("./train.csv")
data_lm = TextLMDataBunch.from_csv(path, "train.csv", min_freq=1)

注意:这是我第一次尝试回答 StackOverflow 问题。希望它有所帮助!


推荐阅读