python - Creating a pandas pivot table to count number of times items appear in a list together
问题描述
I am trying to count the number of times users look at pages in the same session.
I am starting with a data frame listing user_ids and the page slugs they have visited:
user_id page_view_page_slug
1 slug1
1 slug2
1 slug3
1 slug4
2 slug5
2 slug3
2 slug2
2 slug1
What I am looking to get is a pivot table counting user_ids of the cross section of slugs
. | slug1 | slug2 | slug3 | slug4 | slug5 |
---|---|---|---|---|---|
slug1 | 2 | 2 | 2 | 1 | 1 |
slug2 | 2 | 2 | 2 | 1 | 1 |
slug3 | 2 | 2 | 2 | 1 | 1 |
slug4 | 1 | 1 | 1 | 1 | 0 |
slug5 | 1 | 1 | 1 | 0 | 1 |
I realize this will be the same data reflected when we see slug1 and slug2 vs slug2 and slug1 but I can't think of a better way. So far I have done a listagg
def listagg(df, grouping_idx):
return df.groupby(grouping_idx).agg(list)
new_df = listagg(df,'user_id')
Returning:
page_view_page_slug
user_id
1 [slug1, slug2, slug3, slug4]
2 [slug5, slug3, slug2, slug2]
7 [slug6, slug4, slug7]
9 [slug3, slug5, slug1]
But I am struggling to think of loop to count when items appear in a list together (despite the order) and how to store it. Then I also do not know how I would get this in a pivotable format.
解决方案
这是另一种方法,通过使用 numpy 广播创建一个矩阵,该矩阵通过将每个值user_id
与每个其他值进行比较而获得,然后从该矩阵创建一个新的数据帧,并将index
其columns
设置为page_view_page_slug
并sum
计算横截面level=0
的蛞蝓:axis=0
axis=1
user_ids
a = df['user_id'].values
i = list(df['page_view_page_slug'])
pd.DataFrame(a[:, None] == a, index=i, columns=i)\
.sum(level=0).sum(level=0, axis=1).astype(int)
slug1 slug2 slug3 slug4 slug5
slug1 2 2 2 1 1
slug2 2 2 2 1 1
slug3 2 2 2 1 1
slug4 1 1 1 1 0
slug5 1 1 1 0 1
推荐阅读
- elixir - 在 elixir 项目 (elixirLS.projectDir) 中保存文件时出错
- python - Python:比较来自 csv 的行并将相同的结果组合在一起以进行 pdf 布局比较
- azure - 如何在 azure 函数的 local.settings.json 文件中使用提供程序名称正确写入连接字符串
- kotlin - kotlin 中的国际象棋 BitBoards。哪种数据类型?
- android - 为什么我的布局在小屏幕上不能很好地填充?
- node.js - 运行 launch.json 配置时找不到模块“/Applications/Visual”
- reactjs - 为什么更新我的 React 组件的状态会扰乱迭代?
- android - 高度大于屏幕时Android TextInputLayout写入焦点问题
- sql - 根据表的值选择加权随机行
- python - pyqt5 无法将边框应用于无框窗口