python - 'NoneType' 没有属性 'lower' - 清理文本时出错
问题描述
下面是我在数据块中运行的代码,下面是错误。
data = d.select("*").toPandas()
train, test = train_test_split(data, test_size = .20, random_state = True)
train['set'] = 'train'
test['set'] = 'test'
data = pd.concat([train,test], ignore_index=True)
def clean_text(text):
return "".join([c for c in text.lower() if c not in punctuation])
data['text_cleaned'] = data['text'].map(clean_text)
tfidf = TfidfVectorizer()
tfidf.fit(data['text_cleaned'])
错误:
AttributeError: 'NoneType' object has no attribute 'lower'
/local_disk0/tmp/1582551158268-0/PythonShell.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/local_disk0/tmp/1582551158268-0/PythonShell.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
import
AttributeError: 'NoneType' object has no attribute 'lower'
解决方案
您可以过滤掉无:
data = d.select("*").toPandas()
train, test = train_test_split(data, test_size = .20, random_state = True)
train['set'] = 'train'
test['set'] = 'test'
data = pd.concat([train,test], ignore_index=True)
def clean_text(text):
return "".join([c for c in text.lower() if (text is not None) and (c not in punctuation)])
data['text_cleaned'] = data['text'].map(clean_text)
tfidf = TfidfVectorizer()
tfidf.fit(data['text_cleaned'])
推荐阅读
- c - 当“aa”字符串分配给整数变量时发生了什么?
- unity3d - 无法加载文件或程序集“Facebook.Unity.Canvas”
- java - 以编程方式单击工具栏中的后退箭头
- ghostscript - GhostScript和Xpdf在将pdf转换为png上有什么区别
- javascript - 谷歌登录api(platform.js)设置没有SameSite属性的cookie?
- android - 无线运行/安装/调试 android 应用程序,但通过其自己的便携式热点
- .net-core - WSO2 API 管理器与微型网关
- jquery - 使用 JQuery 对选择选项进行排序
- r - 为什么当我使用 rep() 时 rbind 不起作用?
- javascript - Angular:展平/取消展平嵌套的 JSON 对象