首页 > 解决方案 > 预期的字符串或类似字节的对象-亚马逊数据集

问题描述

我正在研究一个亚马逊数据集来执行 LSTM。代码如下:

def data_clean( rev, remove_stopwords=True): 


new_text = re.sub("[^a-zA-Z]"," ", rev)

words = new_text.lower().split()

if remove_stopwords:
    sts = set(stopwords.words("english"))
    words = [w for w in words if not w in sts]
    return words
ary=[]
eng_stemmer = english_stemmer 
for word in words:
    ary.append(eng_stemmer.stem(word))


return ary

但是,一旦我运行 clean _reviewData 和 clean_summarydata,它就会返回如下错误。 在此处输入图像描述

有人可以帮我更正代码吗?

标签: python

解决方案


您没有正确格式化代码,但我希望您有功能

def data_clean( rev, remove_stopwords=True): 
    new_text = re.sub("[^a-zA-Z]"," ", rev)
    words = new_text.lower().split()
    if remove_stopwords:
        sts = set(stopwords.words("english"))
        words = [w for w in words if not w in sts]

最后你忘记return words了 - 所以它运行return None,然后" ".join(data_clean(rev))给你" ".join(None),这给了错误expected string or bytes-like object。因为None不是string or bytes-like object

你需要

def data_clean( rev, remove_stopwords=True): 
    new_text = re.sub("[^a-zA-Z]"," ", rev)
    words = new_text.lower().split()

    if remove_stopwords:
        sts = set(stopwords.words("english"))
        words = [w for w in words if not w in sts]

    return words

推荐阅读