python - 创建具有多个变量条件的新数据框并提取失败的原因 - pandas
问题描述
我有下面的代码。
考虑到以下限制,我想创建如下输出:
A > 5, B > 4, C > 3
如果不满足条件,我想阅读数据框中的以下行,存储数据,并创建一个名为“失败原因”的新列,其中列出 A、B 或 C 是否失败。
然后我希望脚本还报告通过的数据帧行的“X”、“Y”和“Z”的相应值。
此后,脚本应按“组”分组并显示每组的最大“Hs”。
我真的很难在我的数据框中使用多个变量来完成这项工作......任何帮助将不胜感激
期望的输出
Group Hs Fail Reason X Y Z
0 1 1.0 [A, B] 0.9 1.9 0.54
1 2 0.5 [A, B, C] 0.8 2.7 0.43
主要代码- 我目前的尝试
import pandas as pd
data = [[1,0.5,8,8,8,0.85,1.64,0.5],
[1,1,8,8,8,0.9,1.9,0.54],
[1,1.5,0,0,10,1.1,2.0,0.74],
[2,0.5,6,5,4,0.8,2.7,0.43],
[2,1,1,1,1,0.9,2.9,0.45],
[2,1.5,1,2,1,1.1,3.1,0.47]]
columns = ['Group', 'Hs', 'A', 'B', 'C', 'X', 'Y', 'Z']
df = pd.DataFrame(data=data, columns=columns)
Limit_A = 5
Limit_B = 4
Limit_C = 3
# Opens an empty dataframe for appending
df_new = pd.DataFrame(columns=['Group', 'Hs'])
groups = df['Group'].unique()
# for g in groups
for g in groups:
# Create new temp dataframe
df_1 = df[df['Group'] == g]
# Input conditions, checks the columns one by one are NOT EQUAL TO ZERO. Outputs boolean values.
pass_criteria = (df_1['A'] > Limit_A) & (df_1['B'] > Limit_B) & (df_1['C'] > Limit_C)
# PASSES DATAFRAME. Locates rows where the conditions of mask_1 are SATISFIED and creates another temp dataframe.
df_passes = df_1.loc[pass_criteria]
# Find the max value in the dataframe e.g. the greatest operational wave height
max_num = df_passes['Hs'].max()
# Does the opposite of mask_1
fail_criteria = (df_1['A'] < Limit_A) & (df_1['B'] < Limit_B) &(df_1['C'] < Limit_C)
# FAILED DATAFRAME. Locates rows where the conditions of pass_criteria are SATISFIED and creates another temp dataframe.
df_fails = df_1.loc[fail_criteria]
# Uses the dataframe with FAIL and mkes the value_vars rows in the melted dataframe
melted = pd.melt(df_fails, value_vars=['A', 'B', 'C'])
# Pulls out the reason for fails, i.e. when the condition of the df_fail is not met. Set creates a list of unique values.
fails = list(set(melted[melted['value'] > Limit_A]['variable']))
# Input columns of desired outputs.
df_e = pd.DataFrame(columns=['Group', 'Hs', 'Fail Reason'])
# Inputs the lists as defined above.
df_e.loc[0] = [g, max_num, fails]
# Appends to the dataframe in a loop
df_new = df_new.append(df_e)
print(df_new)
解决方案
IIUC 首先将 A、B、C 列与您的限制进行比较,然后agg
最后map
返回结果:
res = df[["A","B","C"]]>[5,4,3]
s = (pd.concat([df, (~res[~res.all(1)]).agg(lambda x: res.columns[x].tolist(),
axis=1).rename("Fail reason")], axis=1)
.dropna().drop_duplicates("Group").set_index("Group")["Fail reason"])
print (df.assign(failed_reason=df["Group"].map(s))
.loc[res.all(1)].sort_values(["Group", "Hs"])
.drop_duplicates("Group", keep="last"))
Group Hs A B C X Y Z failed_reason
1 1 1.0 8 8 8 0.9 1.9 0.54 [A, B]
3 2 0.5 6 5 4 0.8 2.7 0.43 [A, B, C]
推荐阅读
- android - 构建一个apk文件
- python - 如何在 Python 中使用正则表达式从数据集中提取数据?
- amazon-web-services - 使用 AWS 实现零延迟(超低延迟 UUL)的实时流?
- c# - 将异常传递给方法并抛出此类异常
- java - 再次 Autowire Bean 还是使用 getter?
- c# - Oracle 更新命令在 C# 中总是返回 0
- c# - MainWindow 完成后运行测试代码
- spring-boot - 内容被附加到通过 JPA 脚本生成生成的 DDL 文件中,而不是被替换
- c# - 时间:2019-04-10 标签:c#casting child class SOAP
- hyperledger-composer - Hyperledger Composer 事务处理器函数中如何处理“long”类型的整数?