python-3.x - 多类文本分类 TypeError: Input must be a SparseTensor
问题描述
我正在尝试建立一个深度学习模型来进行文本分类。但是,当我运行下面的脚本时,我遇到了这个错误。
InvalidArgumentError: indices[2] = [0,398] is out of order. Many sparse ops require sorted indices. Use `tf.sparse.reorder` to create a correctly ordered copy.
但是,当我尝试使用时tf. sparse. reorder
,我遇到了这个错误,上面写着TypeError: Input must be a SparseTensor.
“
这些是输入的维度
X_train_cv1.shape, y_train.shape, X_validation_cv1.shape, y_validation.shape
((13435, 675), (13435, 3), (3359, 675), (3359, 3))
有没有办法纠正这个问题?
# Split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size=0.2, random_state=42)
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_y_train = encoder.transform(y_train)
# convert integers to dummy variables (i.e. one hot encoded)
y_train= np_utils.to_categorical(encoded_y_train)
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_validation)
encoded_y_validation = encoder.transform(y_validation)
# convert integers to dummy variables (i.e. one hot encoded)
y_validation= np_utils.to_categorical(encoded_y_validation)
# The first document-term matrix has default Count Vectorizer values - counts of bigrams
from sklearn.feature_extraction.text import CountVectorizer
cv1 = CountVectorizer(analyzer='char',ngram_range=(2, 2))
X_train_cv1 = cv1.fit_transform(X_train)
X_validation_cv1 = cv1.transform(X_validation)
input_dim = X_train_cv1.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
X_train_cv1 = tf.sparse.reorder(X_train_cv1)
y_train = tf.sparse.reorder(y_train)
X_validation_cv1 = tf.sparse.reorder(X_validation_cv1)
y_validation = tf.sparse.reorder(y_validation)
history = model.fit(X_train_cv1, y_train,epochs=100,verbose=True,validation_data=(X_validation_cv1, y_validation),batch_size=10)
这是我的数据集
解决方案
好的,我设法找到了答案。显然 Keras 不能很好地处理稀疏数组,所以我只需要将这个编辑包含到我的代码行中以使其成为一个数组。
X_train_cv1 = cv1.fit_transform(X_train).toarray()
X_validation_cv1 = cv1.transform(X_validation).toarray()
推荐阅读
- c - 如何将 PCHAR* 转换为 TCHAR*?
- node.js - 从我的 AWS Educate 账户获取 AWS accesskeyid 错误
- c++ - 订单统计树和一维点问题的运算时间复杂度
- python - 尝试让 pandas 读取我的 json 文件时出错
- python - 如何切片 OrderedDict?
- javascript - 我无法使用 cmd "npx create-react-app" 创建反应应用程序
- java - 缺少零参数构造函数
- docker - 为 Traefik 可以找到的 docker 容器运行 nomad 作业
- python - 在同一个函数中调用函数时,无法从函数返回变量
- java - Minecraft 1.16.4 Spigot 新的下界合金物品问题