python - 确保 pandas 数据框的列具有唯一值
问题描述
鉴于以下情况:
information_dict_from = {
"v1": {0: "type a", 1: "type b"},
"v2": {0: "type a", 1: "type b", 3: "type c"},
"v3": {0: "type a", 1: "type b"},
}
data_from = pd.DataFrame(
{
"v1": [0, 0, 1, 1],
"v2": [0, 1, 1, 3],
"v3": [0, 1, 1, 0],
}
)
我想将其转换为:
information_dict_to = {
"v1": {0: "type a", 1: "type b"},
"v2": {2: "type a", 3: "type b", 4: "type c"},
"v3": {5: "type a", 6: "type b"},
}
data_to = pd.DataFrame(
{
"v1": [0, 0, 1, 1],
"v2": [2, 3, 3, 4],
"v3": [5, 6, 6, 5],
}
)
注意 - 转换后数据框列中的值是互斥的 ( ) ,并且保留了键与相应列set(df['v1']) - set(df['v2']) == set(df['v1'])
之间的映射。information_dict_from[<var>]
<var>
解决方案
# copy *_to from *_from
data_to = data_from.copy()
information_dict_to = information_dict_from.copy()
# set the unique increase counter
val = 0
for col in data_from: # for each column (v1, v2, v3)
u_val_map = {} # create the mapping dict
for u in data_from[col].unique(): # get all posible value
data_to.loc[data_from[col]==u, col] = val #set new unique val
u_val_map[u] = val # record mapping dict
val+=1 # increase 1 to make new val
# updating dict for the key==col by using mapping dict
information_dict_to.update({col:{
u_val_map[key]:information_dict_from[col][key]
for key in information_dict_from[col]}})
然后
>>>data_to
v1 v2 v3
0 0 2 5
1 0 3 6
2 1 3 6
3 1 4 5
>>>information_dict_to
{'v1': {0: 'type a', 1: 'type b'},
'v2': {2: 'type a', 3: 'type b', 4: 'type c'},
'v3': {5: 'type a', 6: 'type b'}}
推荐阅读
- python - 如何获得最后一个随机 int 生成?
- android - 如何将 2 路数据绑定到 livedata
会员? - reactjs - React:根据孩子的状态设置父母的状态?
- sharepoint - 查看一个 SharePoint 文档库下的所有文件
- typescript - TypeScript record and play raw pcm audio
- sql - 在 Django 上注册时出现 login_user 问题
- javascript - 在将视频元素添加到 DOM 之前,如何在用户启动的事件上播放带声音的视频?iOS Safari/Chrome/Webkit
- php - 根据php中的页码更改href
- reactjs - Simplest way to put everything under http://localhost:3000/app instead of http://localhost:3000/ in React
- python - 使用 df.dropna() 返回 NoneType 对象