python - 根据 id 给数据框打分
问题描述
我有一个按日期索引的数据框,我正在尝试根据类别为每个 accountid 提供分数,如果该类别值存在于索引日期,则此数据框将如下所示。
accountid category Smooth Hard Sharp Narrow
timestamp
2018-03-29 101 Smooth 1 NaN NaN NaN
2018-03-29 102 Hard NaN 1 NaN NaN
2018-03-30 103 Narrow NaN NaN NaN 1
2018-04-30 104 Sharp NaN NaN 1 NaN
2018-04-21 105 Narrow NaN NaN NaN 1
循环遍历每个 accountid 的数据帧并为每个未堆叠的类别分配分数的最佳方法是什么。
这是数据框创建脚本。
import pandas as pd
import datetime
idx = pd.date_range('02-28-2018', '04-29-2018')
df = pd.DataFrame(
[[ '101', '2018-03-29', 'Smooth','NaN','NaN','NaN','NaN'], [
'102', '2018-03-29', 'Hard','NaN','NaN','NaN','NaN'
], [ '103', '2018-03-30', 'Narrow','NaN','NaN','NaN','NaN'], [
'104', '2018-04-30', 'Sharp','NaN','NaN','NaN','NaN'
], [ '105', '2018-04-21', 'Narrow','NaN','NaN','NaN','NaN']],
columns=[ 'accountid', 'timestamp', 'category','Smooth','Hard','Sharp','Narrow'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df=df.set_index(['timestamp'])
print(df)
解决方案
您可以使用 str 访问器get_dummies
:
df[['accountid','category']].assign(**df['category'].str.get_dummies())
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth 0 0 0 1
2018-03-29 102 Hard 1 0 0 0
2018-03-30 103 Narrow 0 1 0 0
2018-04-30 104 Sharp 0 0 1 0
2018-04-21 105 Narrow 0 1 0 0
并将 0 替换为 nan,
df[['accountid','category']].assign(**df['category'].str.get_dummies())\
.replace(0,np.nan)
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth NaN NaN NaN 1.0
2018-03-29 102 Hard 1.0 NaN NaN NaN
2018-03-30 103 Narrow NaN 1.0 NaN NaN
2018-04-30 104 Sharp NaN NaN 1.0 NaN
2018-04-21 105 Narrow NaN 1.0 NaN NaN
推荐阅读
- angular - Angular 9 动画不适用于动态数据
- node.js - 如何使用 NodeJS 在 CSV 中删除列和重命名列
- javascript - 使用 router.push - next.js 后 csrf 令牌无效
- android - Android RecyclerView Width 比 Screen 宽
- css - Bootstrap Dropdown 在单击 Mobile 后关闭折叠菜单(不显示下拉菜单)
- sql - SQL获取有序数据的排名
- html - 允许图像溢出父 div 但需要将内容向下推送
- react-native - 如何在 React Native 中选择正确的 gradle 版本
- monaco-editor - 如何在评论中为“待办事项”创建规则
- python - Python中的动态数据框转换