python - Complex Grouping, Sorting and Value filtering in Pandas
问题描述
Have a pandas dataframe in the following form:
Col1 Col2 Col3
t1 m1 1
t2 m2 7
t3 m1 4
t4 m2 8
t5 m3 4
t6 m3 6
t7 m3 11
t8 m4 9
I want to create to labels ( say hi/low/med) into a new column based on the following: Group the data based on Col2 and then sort the data (assuming asscending) based on Col1. Once done then we will choose the
first match of the sorted data (the lowest value) within a group and check the following:
If the first value of Col3 within a group >x :
if any of values in the remaining group <=x :
Then assign high for the first match to Col 4 #a1
else
assign med for the first match to Col4 #a2
If the first value of Col 3 within a group <=x and any of values in the
remaining group < or >=x :
Then assign low for the first match to Col 4 #b1
Also if there is only one element in a group ( as in the case m4 below), then we assign a value high/low based on value >or
At the end drop the rmeaining elements of each groups excpet the first one.
Thus assuming x = 4 and assuming (t1
Col1 Col2 Col3 Col4
t1 m1 1 low
t2 m2 7 med
t3 m1 4 # should drop
t4 m2 8 # should drop
t5 m3 5 high
t6 m3 2 # should drop
t7 m3 11 # should drop
t8 m4 9 high
So final table will be like:
Col1 Col2 Col3 Col4
t1 m1 1 low
t2 m2 7 med
t5 m3 5 high
t8 m4 9 high
解决方案
IIUC,采取这个示例数据框(您的 OP 中有冲突的数据框):
import pandas as pd
import numpy as np
df = pd.DataFrame([
['t1', 'm1', 1],
['t2', 'm2', 7],
['t3', 'm1', 4],
['t4', 'm2', 8],
['t5', 'm3', 5],
['t6', 'm3', 2],
['t7', 'm3', 11],
['t8', 'm4', 9]],
columns=['Col1', 'Col2', 'Col3'])
产量:
Col1 Col2 Col3
0 t1 m1 1
1 t2 m2 7
2 t3 m1 4
3 t4 m2 8
4 t5 m3 5
5 t6 m3 2
6 t7 m3 11
7 t8 m4 9
然后用你的逻辑定义你的函数:
def my_function(x, val):
if x.shape[0]==1:
if x.iloc[0]>val:
return 'high'
else:
return 'low'
if x.iloc[0]>val and any(i<=val for i in x.iloc[1:]):
return 'high'
elif x.iloc[0]>val:
return 'med'
elif x.iloc[0]<=val:
return 'low'
else:
return np.nan
现在将该函数应用于您的排序和分组数据帧,使用4
您的示例输入x
作为您的 OP 中所述的输入:
df['Col4'] = df.sort_values(['Col2','Col1']).groupby('Col2')['Col3'].transform(my_function, (4))
df = df.sort_values(['Col2','Col1']).groupby('Col2').first().reset_index()
这产生:
Col2 Col1 Col3 Col4
0 m1 t1 1 low
1 m2 t2 7 med
2 m3 t5 5 high
3 m4 t8 9 high
推荐阅读
- visual-studio-code - 来自 .editorconfig 的设置在 Omnisharp / VS Code 中不受尊重
- jquery - 在ajax成功时使用jquery更新数据表单元格
- lidar - 用于分析 LiDAR 数据的传统计算机视觉技术
- javascript - chrome扩展弹出窗口是否可以知道它是否是第一次打开?
- anaconda - 在最近的 anaconda 版本(特别是 2019.10)中配置的环境名称的选项卡完成如何?
- python - Chrome T-Rex-Game 强化学习没有改善
- python - 如何使用 python (Django) 将我的图像输出到 html 文档中
- java - 使用 Java Stream 在单个语句中检查 2 个字符串是否有任何共同字符
- finite-automata - 如何找到 NFA 的语言
- node.js - NodeJS:从网络摄像头下载文件->不可读文件