python - How do I calculate number of words and number of unique words contained within a list of a column across all rows of my dataframe?
问题描述
I generated a column df['adjectives']
in my pandas dataframe that has a list of all the adjectives from another column, df['reviews']
.
The values of df['adjectives']
are in this format, for example:
['excellent', 'better', 'big', 'unexpected', 'excellent', 'big']
I would like to create a new column that counts the total number of words in df['adjectives']
as well as the number of 'unique' words in df['adjectives']
.
The function should iterate across the entire dataframe and apply the counts for each row.
For the above row example, I would want df['totaladj']
to be 6 and df['uniqueadj']
to be 4 (since 'excellent' and 'big' are repeated)
import pandas as pd
df=pd.read_csv('./data.csv')
df['totaladj'] = df['adjectives'].str.count(' ') + 1
df.to_csv('./data.csv', index=False)
The above code works when counting the total number of adjectives, but not the unique number of adjectives.
解决方案
Is this the type of behavior that you are looking for?
Based off of your description I assumed that the values in the adjectives column are a string formatted like a list e.g. "['big','excellent','small']"
The code below converts the strings to a list using split(), and then gets the length using len().Finding the number of unique adjectives is done by converting the list to a set before using len().
df['adjcount'] = df['adjectives'].apply(lambda x: len(x[1:-1].split(',')))
df['uniqueadjcount'] = df['adjectives'].apply(lambda x: len(set(x[1:-1].split(','))))
推荐阅读
- python - 在 Pandas 数据框的最后一行中保留/选择具有 n 个最高值的列
- r - 在 R 中合并和转换 Dataframe 中的特定行
- excel - 聚合值计算每列每种类型值的数量
- python - 使用python按列值移动多个csv文件
- javascript - 尝试使用 Mongoose 时出现 MongoClient 未连接错误?
- javascript - apirtc:与音频系统共享屏幕的问题
- go - “给定类型 int 的无类型常量索引”是什么意思?
- node.js - 如何解决迁移错误 env: node\r: from Gatbsy-images to Gatsby-plugin-images with gatsby-codemods?
- sftp - Rebex.net 认证调试
- excel - 在excel宏公式中粘贴剪贴板值