python - Pandas - dataframe containing comments(rows) and words as column headers how to get a frequency count?
问题描述
I am trying to perform a word frequency count on a relatively large dataframe and don't know what approach would be the best.
Currently my dataframe looks like this -
Comment 'I' 'it' 'is' 'up'
'I was here' NaN NaN NaN NaN
'I like soup' NaN NaN NaN NaN
'whats up' NaN NaN NaN NaN
'This is it' NaN NaN NaN NaN
My goal is to perform a frequency count for each of the words in the column headers ('I', 'it', 'is', 'up') for each comment. E.g. after the counting process the result should look something like this -
Comment 'I' 'it' 'is' 'up'
'I was here' 1 0 0 0
'I like soup' 1 0 0 0
'whats up' 0 0 0 1
'This is it' 0 1 1 0
What would be the best approach to this? The real dataset contains about 50k comments and over 10k columns with different words.
解决方案
我认为没有比以下更好的方法了:
for column in df.columns[1:]: # All but comment column.
df[column] = df[column].str.contains(df['Comment'])
这将为您提供一个布尔矩阵,如果您真的需要,您可以将其映射到位。
推荐阅读
- ruby - 在 ruby 中使用 begin_with 的 Dynamo DB 查询
- html - 从IDEAS中提取学术出版物信息
- cookies - 电子是否将来自 mainWindow 的 cookie 存储在内存中或硬盘上的某个地方?如果硬盘那么在哪里?
- biopython - Biopython PDBIO 组装链 ID
- sql - 如何在带有 Canvas 的 Kibana 中使用 Elastic SQL 绘制时间序列直方图?
- android - 解析只有根的 JSON 文件 - Android
- c# - 是否有一种异步友好的方式来查找 C# 中的 S3 上是否存在文件
- javascript - 如何验证动态生成的字段不为空?
- javascript - 为什么在nuxt中提交表单时登录页面重定向到索引页面?
- angular - 如果没有从 GIT 获取 .spec 文件更改,则跳过 Jenkins 阶段进行“ng 测试”