首页 > 解决方案 > 从具有多个列名的数据帧流入 y_col 生成类型错误

问题描述

我正在使用来自数据框的流来处理具有 14 个可能标签的多标签分类问题,所有列名都以字符串格式放置在列表中,例如:

columns = ["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity", "Lung      Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis", "Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture", "Support Devices"]

然后将列表名称(列)输入 y_col 例如:

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
seed=42,
shuffle=True,
target_size=(224, 224))

我收到此错误:

TypeError: If class_mode="categorical", y_col="['No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices']" column values must be type string, list or tuple.

我已经尝试过之前提出的解决方案:

df['No Finding'] = df['No Finding'].astype(str)
df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str)
df['Cardiomegaly'] = df['Cardiomegaly'].astype(str)
df['Lung Opacity'] = df['Lung Opacity'].astype(str)
df['Lung Lesion'] = df['Lung Lesion'].astype(str)
df['Edema'] = df['Edema'].astype(str)
df['Consolidation'] = df['Consolidation'].astype(str)
df['Pneumonia'] = df['Pneumonia'].astype(str)
df['Atelectasis'] = df['Atelectasis'].astype(str)
df['Pneumothorax'] = df['Pneumothorax'].astype(str)
df['Pleural Effusion'] = df['Pleural Effusion'].astype(str)
df['Pleural Other'] = df['Pleural Other'].astype(str)
df['Fracture'] = df['Fracture'].astype(str)
df['Support Devices'] = df['Support Devices'].astype(str)

它仅在我向 y_col 提供单个列名时才有效。我正在使用 keras 2.2.4,我已经卸载了 keras.preprocessing 并安装了 github 版本。似乎来自目录函数的流不支持使用默认类模式作为分类将多个列名以列表格式馈送到 y_col,因为这是一个多标签分类问题。我怀疑类型问题源于熊猫数据帧值仅被转换为对象,而 keras 预处理数据帧迭代器代码仅允许字符串、列表或元组,但熊猫不会直接将字符串转换为对象。下面是我的代码:

df=pd.read_csv('/home/admin1/Downloads/CheXpert-v1.0/train.csv')

df = df.replace(np.nan, 0)
df['No Finding'].head()

df['No Finding'] = df['No Finding'].astype(str)
df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str)
df['Cardiomegaly'] = df['Cardiomegaly'].astype(str)
df['Lung Opacity'] = df['Lung Opacity'].astype(str)
df['Lung Lesion'] = df['Lung Lesion'].astype(str)
df['Edema'] = df['Edema'].astype(str)
df['Consolidation'] = df['Consolidation'].astype(str)
df['Pneumonia'] = df['Pneumonia'].astype(str)
df['Atelectasis'] = df['Atelectasis'].astype(str)
df['Pneumothorax'] = df['Pneumothorax'].astype(str)
df['Pleural Effusion'] = df['Pleural Effusion'].astype(str)
df['Pleural Other'] = df['Pleural Other'].astype(str)
df['Fracture'] = df['Fracture'].astype(str)
df['Support Devices'] = df['Support Devices'].astype(str)
df['Age'] = df['Age'].astype(str)

df.dtypes

columns=["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity",
"Lung Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis",
"Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture",
"Support Devices"]

datagen=ImageDataGenerator(rescale=1./255.)
test_datagen=ImageDataGenerator(rescale=1./255.)

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
seed=42,
shuffle=True,
target_size=(224, 224))

标签: keras

解决方案


我遇到了同样的问题,并且能够通过将 class_mode 参数更改为“其他”来解决它。在关注 flow_from_dataframe() 的 tensorflow 文档中的几个链接后,我浏览了本教程。

因此,根据您上面的内容,您只需将您的 class_mode 直接设置为“其他”,它应该可以工作。

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
class_mode='raw'
seed=42,
shuffle=True,
target_size=(224, 224))

不过我应该说,我在 tensorflow 或 keras 文档中都没有看到 class_mode 'other' 的提及。但是,它似乎确实有效,所以我现在正在使用它。

编辑:我已经意识到“其他”在当前版本的 keras 中被贬低了。我已经更新了上面的代码以反映新的正确 class_mode 应该是“原始”。


推荐阅读