python - How to label same pandas dataframe rows?
问题描述
I have a large pandas dataframe like this:
log apple watermelon orange lemon grapes
1 1 1 yes 0 0
1 2 0 1 0 0
1 True 0 0 0 2
2 0 0 0 0 2
2 1 1 yes 0 0
2 0 0 0 0 2
2 0 0 0 0 2
3 True 0 0 0 2
4 0 0 0 0 2.1
4 0 0 0 0 2.1
How can I label the rows that are the same, for example:
log apple watermelon orange lemon grapes ID
1 1 1 yes 0 0 1
1 2 0 1 0 0 2
1 True 0 0 0 2 3
2 0 0 0 0 2 4
2 1 1 yes 0 0 1
2 0 0 0 0 2 4
2 0 0 0 0 2 4
3 True 0 0 0 2 3
4 0 0 0 0 2.1 5
4 0 0 0 0 2.1 5
I tried to:
df['ID']=df.groupby('log')[df.columns].transform('ID')
And
df['personid'] = df['log'].clip_upper(2) - 2*d.duplicated(subset='apple')
df
However, the above doesnt work because I literally have a lot of columns.
But its not giving me the expected output. Any idea of how to group and label this dataframe?
解决方案
给定
x = io.StringIO("""log apple watermelon orange lemon grapes
1 1 1 yes 0 0
1 2 0 1 0 0
1 True 0 0 0 2
2 0 0 0 0 2
2 1 1 yes 0 0
2 0 0 0 0 2
2 0 0 0 0 2
3 True 0 0 0 2
4 0 0 0 0 2.1
4 0 0 0 0 2.1""")
df2 = pd.read_table(x, delim_whitespace=True)
您可以首先使用transform
with tuple 使每一行可散列和可比较,然后使用索引并range
创建唯一 id
f = df2.transform(tuple,1).to_frame()
k = f.groupby(0).sum()
k['id'] = range(1,len(k.index)+1)
最后
df2['temp_key'] = f[0]
df2 = df2.set_index('temp_key')
df2['id'] = k.id
df2.reset_index().drop('temp_key', 1)
log apple watermelon orange lemon grapes id
0 1 1 1 yes 0 0.0 1
1 1 2 0 1 0 0.0 2
2 1 True 0 0 0 2.0 3
3 2 0 0 0 0 2.0 4
4 2 1 1 yes 0 0.0 5
5 2 0 0 0 0 2.0 4
6 2 0 0 0 0 2.0 4
7 3 True 0 0 0 2.0 6
8 4 0 0 0 0 2.1 7
9 4 0 0 0 0 2.1 7
推荐阅读
- node.js - 如何从 dynamodb 批量删除所有没有 TTL 的记录?
- react-native - 如何使用 create-react-native-app Metro 捆绑器进行配置
- jquery - 重构表
- java - 如何让 GitHub 操作给我一个 jar 文件而不是 zip?
- postgresql - 如何在 Scala 中使用环境变量和 application.conf 文件?
- matlab - 如何插入 3D 数据的时间和速度?
- c# - 将 WebClient 代码转换为 HttpClient 代码
- javascript - 未捕获的类型错误:无法在“TextDecoder”上执行“解码”:提供的值不是“(ArrayBuffer 或 ArrayBufferView)”类型
- uri - 生成临时 URI
- regex - Ansible:仅当是最新版本时才安装包