python-applymap - 从段落中获取单词的最大长度
问题描述
我正在处理一个文本问题,我的 pandas 数据框包含许多列,其中一个由段落组成。我在输出中需要的是定义的 3 列 -
- 最大单词的长度
- 最大单词的数量(如果有任何相似的长度)
- 此类相似长度单词的总数。
如果一个单词用空格隔开,我会考虑它。使用 python 寻找答案apply-map
。
这是一个示例输入数据 -
df = pd.DataFrame({'text':[
"that's not where the biggest opportunity is - it's with heart failure drug - very very huge market....",
"Of course! I just got diagnosed with congestive heart failure and type 2 diabetes. I smoked for 12 years and ate like crap for about the same time. I quit smoking and have been on a diet for a few weeks now. Let me assure you that I'd rather have a coke, gummi bears, and a bag of cheez doodles than a pack of cigs right now. Addiction is addiction.",
"STILLWATER, Okla. (AP) ? Medical examiner spokeswoman SpokesWoman: Oklahoma State player Tyrek Coger died of enlarged heart, manner of death ruled natural."
]})
df
text
0 that's not where the biggest opportunity is - ...
1 Of course! I just got diagnosed with congestiv...
2 STILLWATER, Okla. (AP) ? Medical examiner spok...
这是预期的输出 -
text word_count word_length words
0 that's not where the biggest opportunity is - ... 1 11 opportunity
1 Of course! I just got diagnosed with congestiv... 1 10 congestive
2 STILLWATER, Okla. (AP) ? Medical examiner spok... 2 11 spokeswoman SpokesWoman
解决方案
以下代码应该可以解决问题:
def get_values(text):
tokens = text.split() # Splitting by whitespace
max_word_length = -1
list_words = [] # Initializing list of max length words
for token in tokens:
if len(token) > max_word_length:
max_word_length = len(token)
list_words = [] # Clearning the list, since there's a new max
list_words.append(token)
elif len(token) == max_word_length:
list_words.append(token)
words_string = ' '.join(list_words) if len(list_words) > 1 else list_words[0] # Concatenating list into string
return [len(list_words), max_word_length, list_words]
df['word_count'], df['word_length'], df['words'] = zip(*df['text'].map(get_values))
编辑:忘记连接列表
推荐阅读
- extjs - 如何使用 store.sync 在 Ext JS 中为可编辑网格保存数据?
- mysql - MYSQL 需要在单个表上更快地分组
- sql-server - 在不确定我必须使用多少条记录时添加 PIVOT 值
- java - Hibernate 中的 CacheImpl
- anaconda - 安装底图后,我的 Anaconda Navigator 和 Spyder 无法启动
- regex - 正则表达式将电话号码与 00 和 + 匹配并且还允许空格
- java - 如何将@Projection 分配给@GetMapping spring servlet 端点?
- spring-boot - SpringBoot scanBasePackages 在多模块项目中不起作用
- python - 如何在 ipython 会话中从内存中恢复变量值?
- excel - 行趋势向上或向下切换