python - How to get combination of combinations in pandas dataframe?
问题描述
Here I have a dataset with transactions. Each transaction could have 1+ different values - 'dimensions'. Values could not be the same per transaction. I want to create a dataframe with 'dimensions' in the columns and in lines, and count how many times one dimension was used together with another per transaction.
Here what I tried
dim_set = [ (1, 'Customer group$Large'),
(1, 'DEPARTMENT$Sales'),
(2, 'Customer group$Medium'),
(2, 'DEPARTMENT$Sales'),
(3, 'DEPARTMENT$Sales'),
(4, 'Customer group$Small'),
(4, 'DEPARTMENT$Sales')
]
df = pd.DataFrame(dim_set, columns=['combination_id', 'dimension'])
df
df_st_1 = df.pivot_table(index = 'dimension', columns = 'dimension',values = 'combination_id', aggfunc = 'count')
df_st_1
an expected result should be like this
dim_set = [ ('Customer group$Large', 1, 1, 0, 0),
('DEPARTMENT$Sales', 1, 4, 1, 1),
('Customer group$Medium', 0, 1, 1, 0),
('Customer group$Small', 0, 1, 0, 1)
]
df = pd.DataFrame(dim_set, columns=['dimension','Customer group$Large', 'DEPARTMENT$Sales', 'Customer group$Medium', 'Customer group$Small'])
df
解决方案
使用DataFrame.merge
with ,最后通过andcrosstab
进行一些数据清理:DataFrame.reset_index
DataFrame.rename_axis
df1 = df.merge(df, on='combination_id', suffixes=('','_'))
df1 = (pd.crosstab(df1['dimension'], df1['dimension_'])
.reset_index()
.rename_axis(None)
.rename_axis(None, axis=1))
print (df1)
dimension Customer group$Large Customer group$Medium \
0 Customer group$Large 1 0
1 Customer group$Medium 0 1
2 Customer group$Small 0 0
3 DEPARTMENT$Sales 1 1
Customer group$Small DEPARTMENT$Sales
0 0 1
1 0 1
2 1 1
3 1 4
推荐阅读
- python - 在字典字符串比较中查找日期时间戳的值
- ionic-framework - 在 Ionic-v1 中的 Android 11 上打开相机时出错
- sveltekit - SvelteKit:如何添加全局错误 401 处理?
- tree - 即使它不是二叉搜索树,您也可以搜索树吗?
- python - 将 tbody 转换为没有标题的 Dataframe
- git - 如何将最后 N 次提交压缩为 git 中的一次提交?
- python - 删除 Amazon S3 存储桶中文件夹内特定文件格式的所有版本
- vba - 根据收到电子邮件的时间转发电子邮件
- java - JavaFX 应用程序类和模块系统
- python - 我应该在 python 中使用 _ 或 __ 前缀作为私有属性吗?