pandas - 如果匹配列表中的单词,则计算 unigrams 和 bigrams
问题描述
如果单词出现在列表中,我正在尝试计算字符串中的所有单词。但是,我的代码仅适用于一元组,而不适用于二元组或三元组。我如何计算所有三个?
text_df = [['take time'], ['spend extra time adjust schedule'], ['work more hour extend hour']]
text_df = pd.DataFrame(text_df, columns = ['answer1'])
time_words = ['time', 'extra time', 'adjust schedule', 'extend']
##I tried this but it only counts unigrams
text_df['time_wordcount'] = text_df['answer1'].apply(lambda x: len([wrd for wrd in x.split() if wrd in time_words]))
##Update: This works but it is really long hand
text_df['split'] = text_df['answer1'].apply(lambda x: x.split())
text_df['split2'] = text_df['split'].apply(lambda x: [' '.join(pair) for pair in zip(x, x[1:])])
text_df['split3'] = text_df['split'] + text_df['split2']
text_df['time_wordcount'] = text_df['split3'].apply(lambda x: len([wrd for wrd in x if wrd in time_words]))
有任何想法吗?
解决方案
推荐阅读
- c# - C#HttpClient中如何计算发送请求的时间、等待响应的时间、接收响应的时间?
- python - 单击带有 selenium 和 python 的动态下拉元素
- flutter - 将默认图标主题设置为 cupertino flutter appbar
- github-actions - Github 操作在上一步提交后进行 lint
- python-3.x - 对于具有负整数的列表,默认情况下对集合进行反向排序
- c# - 带有复选框的 C# WPF 目录树视图:检查构建项目失败,PropertyChanged 为空
- asp.net - StackExchange.Redis ConnectionMultiplexer 池用于同步方法
- javascript - 从 JS 客户端上的 Flask 服务器接收 xlsx 文件
- java - 删除映射方法不适用于弹簧
- html - 使用 zurb 基础代码的手风琴切换图标