python - Sort multiple columns' values from min to max, and put in new columns in pandas dataframe
问题描述
I have a dataframe with datetime objects in columns 3 through 6. I want to sort these dates into new columns: P_min
, P_2
, P_3
, P_max
, from earliest ("min") to latest date ("max"). I can easily get the min and max values and put them into their own column. However, how can I get the middle values (P_2
and P_3
)?
This is what I have so far:
import pandas as pd
df = pd.DataFrame(data={'Name':['a','b','c','d'],'Number':[1,2,3,4], 'Contact':['foo1','foo2','foo3','foo4'],3:[pd.to_datetime('1/1/2015'),pd.NaT,pd.NaT,pd.to_datetime('1/1/2015')],4:[pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002')], 5:[pd.NaT,pd.NaT,pd.NaT,pd.to_datetime('3/15/2015')], 6:[pd.NaT,pd.to_datetime('3/15/2015'),pd.NaT,pd.to_datetime('4/10/2007')]});
> df
Name NumberContact 3 4 5 6
0 a 1 foo1 2015-01-01 2002-02-20 NaT NaT
1 b 2 foo2 NaT 2002-02-20 NaT 2015-03-15
2 c 3 foo3 NaT 2002-02-20 NaT NaT
3 d 4 foo4 2015-01-01 2002-02-20 2015-03-15 2007-04-10
Then I can manually set the min and max values:
df['P_min'] = df.iloc[:,3:6].min(axis=1) #axis=1 is the column
df['P_max'] = df.iloc[:,3:6].max(axis=1) #axis=1 is the column
I'm trying to make something work where I replace the min/max values so I could get a new min value which would be P_2, and so forth...
df.iloc[:,3:7].replace(to_replace=df.iloc[:,3:7].min(axis=1), value=pd.NaT)
Could someone please help with a more efficient or easy method such as a for loop?
解决方案
这是一个优雅的解决方案,将其转换为 int 的 numpy 矩阵 -> 排序 -> 将其转换回日期时间
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'Name':['a','b','c','d'],'Number':[1,2,3,4], 'Contact':['foo1','foo2','foo3','foo4'],3:[pd.to_datetime('1/1/2015'),pd.NaT,pd.NaT,pd.to_datetime('1/1/2015')],4:[pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002')], 5:[pd.NaT,pd.NaT,pd.NaT,pd.to_datetime('3/15/2015')], 6:[pd.NaT,pd.to_datetime('3/15/2015'),pd.NaT,pd.to_datetime('4/10/2007')]});
matrix = np.array(df[df.columns[3:7]].astype(int))
matrix.sort(axis = 1)
df_t = pd.DataFrame(matrix, columns = ['P_min', 'P_2', 'P_3', 'P_max'])
conc = [pd.to_datetime(df_t[x]) for x in df_t.columns]
pd.concat([df] + conc, axis = 1)
Out[1]:
Name Number Contact 3 4 5 6 P_min P_2 P_3 P_max
0 a 1 foo1 2015-01-01 2002-02-20 NaT NaT NaT NaT 2002-02-20 2015-01-01
1 b 2 foo2 NaT 2002-02-20 NaT 2015-03-15 NaT NaT 2002-02-20 2015-03-15
2 c 3 foo3 NaT 2002-02-20 NaT NaT NaT NaT NaT 2002-02-20
3 d 4 foo4 2015-01-01 2002-02-20 2015-03-15 2007-04-10 2002-02-20 2007-04-10 2015-01-01 2015-03-15
如何将所有 P_min 标准化为实际日期以避免 NaT 的棘手方法
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'Name':['a','b','c','d'],'Number':[1,2,3,4], 'Contact':['foo1','foo2','foo3','foo4'],3:[pd.to_datetime('1/1/2015'),pd.NaT,pd.NaT,pd.to_datetime('1/1/2015')],4:[pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002'),pd.to_datetime('2/20/2002')], 5:[pd.NaT,pd.NaT,pd.NaT,pd.to_datetime('3/15/2015')], 6:[pd.NaT,pd.to_datetime('3/15/2015'),pd.NaT,pd.to_datetime('4/10/2007')]});
matrix = np.array(df[df.columns[3:7]].astype(int))
matrix[matrix == -9223372036854775808] = 4102444800000000000 # it gives you 2100-01-01 after convertation, you can easily filtered it out then
matrix.sort(axis = 1)
df_t = pd.DataFrame(matrix, columns = ['P_min', 'P_2', 'P_3', 'P_max'])
conc = [pd.to_datetime(df_t[x]) for x in df_t.columns]
pd.concat([df] + conc, axis = 1)
Out[2]:
Name Number Contact 3 4 5 6 P_min P_2 P_3 P_max
0 a 1 foo1 2015-01-01 2002-02-20 NaT NaT 2002-02-20 2015-01-01 2100-01-01 2100-01-01
1 b 2 foo2 NaT 2002-02-20 NaT 2015-03-15 2002-02-20 2015-03-15 2100-01-01 2100-01-01
2 c 3 foo3 NaT 2002-02-20 NaT NaT 2002-02-20 2100-01-01 2100-01-01 2100-01-01
3 d 4 foo4 2015-01-01 2002-02-20 2015-03-15 2007-04-10 2002-02-20 2007-04-10 2015-01-01 2015-03-15
推荐阅读
- python - scrapy 使用不同的参数多次运行同一个蜘蛛(起始网址)
- javascript - 如何在 angularjs 中设置路线以从动态路线出发?
- sql - BigQuery:回填时如何在查询中使用 run_date
- python-3.x - Windows FileNotFoundError: [Errno 2] 没有这样的文件或目录
- prolog - 表示受限域的唯一性
- javascript - 单击按钮时未将元素添加到 div
- node.js - MongoDB 查询:我无法在回调范围之外获取结果对象
- java - “如何修复错误'表达式的非法开始”-java
- user-interface - JavaFX stop() 方法冻结 GUI
- php - 有没有办法使用按钮和 php 删除我的数据库行