pandas - 您如何将搬运工词干分析器应用于熊猫 df?
问题描述
每个人。我在尝试对 pd df 中的所有内容进行搬运时遇到问题。
这就是我正在尝试的。
df['txt'] = pos_tag(word_tokenize(df['txt']))
返回的错误是...
TypeError:预期的字符串或类似字节的对象
解决方案
您没有共享数据,也没有定义pos_tag
,但我从您的标题中假设它实际上是porter_stemmer
您所指的。现在,假设您有以下数据框:
id txt
0 1 I am the greatest. I am liked
1 2 You are the best. You are loved.
2 3 3
3 4 Why is that so? Chocolates.
4 5 It tried me!
5 5 do it! He retrieves the dogs!
6 6 Why not ? He rocketed to the stars.
然后,分两步进行标记化和词干化:
from nltk.stem import PorterStemmer, WordNetLemmatizer
import nltk
from nltk.tokenize import word_tokenize
porter_stemmer = PorterStemmer()
import pandas as pd
df['tokenized_sentence'] = df.apply(lambda row: nltk.word_tokenize(row['txt']), axis=1)
df['stem'] = df['tokenized_sentence'].apply(lambda x : [porter_stemmer.stem(y) for y in x])
返回
id txt \
0 1 I am the greatest. I am liked
1 2 You are the best. You are loved.
2 3 3
3 4 Why is that so? Chocolates.
4 5 It tried me!
5 5 do it! He retrieves the dogs!
6 6 Why not ? He rocketed to the stars.
tokenized_sentence \
0 [I, am, the, greatest, ., I, am, liked]
1 [You, are, the, best, ., You, are, loved, .]
2 [3]
3 [Why, is, that, so, ?, Chocolates, .]
4 [It, tried, me, !]
5 [do, it, !, He, retrieves, the, dogs, !]
6 [Why, not, ?, He, rocketed, to, the, stars, .]
stem
0 [I, am, the, greatest, ., I, am, like]
1 [you, are, the, best, ., you, are, love, .]
2 [3]
3 [why, is, that, so, ?, chocolate, .]
4 [It, tri, me, !]
5 [do, it, !, He, retriev, the, dog, !]
6 [why, not, ?, He, rocket, to, the, star, .]
推荐阅读
- javascript - 如何在我的节点 js 脚本中连接 amp 页面?
- encryption - 使用 weblogic 数据源解密 Oracle 数据库中的透明数据加密 (TDE)
- r - 如何裁剪在 R 中使用 YOLO 检测到的对象?
- qt - 错误:未知方法参数类型:QString&
- c# - 如何使用 linq 检查元素在 XML C# 中是否具有正确的子元素?
- python - 一棵树的后序遍历
- javascript - 我需要使用键值对的对象将选项下拉列表添加到选择元素
- scala - 在 Spark 的结构化流模式下,获取 Offset 的消息正在重置
- lucene - 创建索引器时文件系统锁定
- java - 有没有办法从抽象类的具体方法访问具体的类变量值