python - How to create new columns based on whether another group of Columns Exists
问题描述
My Problem is as follows:
I have a dataframe df
which has 5 columns say ('A', 'B', 'C', 'D', 'E')
Now I am looking to combine these columns for some other purposes based on the columns where they are in sets say GP1 = [ 'A', 'B', 'D']
and GP2 = ['C','E']
based on which I will create two new columns.
df['Group1'] = df[GP1].min(axis=1)
df['Group2'] = df[GP2].max(axis=1)
However, it can be possible based on the data that many times say the column 'A' ( or say 'D' or 'B' or maybe all) may be missing from the first set or maybe the column 'C' or 'E' (or both) may be missing from second set.
So what I am looking for is to do something such that the code will check if any of the columns from first set or second set is missing and then only create the new 'Group1' or 'Group2' if all columns exists in a group and if any of the columns in any set is missing it will then skip creating the new column.
How can I achieve that. I was trying for loops but not helping and becoming complicated logic.
An example when all the columns in both set is there:
df_in
A B C D E
1 2 3 4 5
2 4 6 2 3
1 0 2 4 2
df_out
A B C D E Group1 Group2
1 2 3 4 5 1 5
2 4 6 2 3 2 6
1 0 2 4 2 0 2
An example when say E column from second group is not there:
df_in
A B C D
1 2 3 4
2 4 6 2
1 0 2 4
df_out
A B C D Group1
1 2 3 4 1
2 4 6 2 2
1 0 2 4 0
When both A & D are missing from set A ( and only B is there from set/group 1)
df_in
B C E
2 3 5
4 6 3
0 2 2
df_out
B C E Group2
2 3 5 5
4 6 3 6
0 2 2 2
The following case when A from set 1 missing and C from set 2 missing :
df_in
B D E
2 4 5
4 2 3
0 4 2
df_out
B D E
2 4 5
4 2 3
0 4 2
Any help in this direction will be immensely appreciated. Thanks
解决方案
Here you go, I think you can use this:
df_out = (df_in.assign(Group1=df_in.reindex(gp1, axis=1).dropna().min(axis=1),
Group2=df_in.reindex(gp2, axis=1).dropna().max(axis=1))
.dropna(axis=1, how='all'))
MCVE:
df_in = pd.read_clipboard() #Read from copy of df_in in the question above
print(df_in)
# A B C D E
# 0 1 2 3 4 5
# 1 2 4 6 2 3
# 2 1 0 2 4 2
gp1 = ['A','B','D']
gp2 = ['C','E']
df_out = (df_in.assign(Group1=df_in.reindex(gp1, axis=1).dropna().min(axis=1),
Group2=df_in.reindex(gp2, axis=1).dropna().max(axis=1))
.dropna(axis=1, how='all'))
print(df_out)
# A B C D E Group1 Group2
# 0 1 2 3 4 5 1 5
# 1 2 4 6 2 3 2 6
# 2 1 0 2 4 2 0 2
df_in_copy=df_in.copy() #make a copy to reuse later
df_in = df_in.drop('E', axis=1) #Drop Col E
print(df_in)
# A B C D
# 0 1 2 3 4
# 1 2 4 6 2
# 2 1 0 2 4
df_out = (df_in.assign(Group1=df_in.reindex(gp1, axis=1).dropna().min(axis=1),
Group2=df_in.reindex(gp2, axis=1).dropna().max(axis=1))
.dropna(axis=1, how='all'))
print(df_out)
# A B C D Group1
# 0 1 2 3 4 1
# 1 2 4 6 2 2
# 2 1 0 2 4 0
df_in = df_in_copy.copy() #Copy for copy create
df_in = df_in.drop(['A','D'], axis=1) #Drop Columns A and D
print(df_in)
# B C E
# 0 2 3 5
# 1 4 6 3
# 2 0 2 2
df_out = (df_in.assign(Group1=df_in.reindex(gp1, axis=1).dropna().min(axis=1),
Group2=df_in.reindex(gp2, axis=1).dropna().max(axis=1))
.dropna(axis=1, how='all'))
print(df_out)
# B C E
# 0 2 3 5
# 1 4 6 3
# 2 0 2 2
推荐阅读
- lua - 从表中删除项目时,如何避免在 Lua 中创建零引用?
- javascript - 验证数组中的值是否相同或不同
- sql - 如何使用 postgresql 获取 json 对象作为列值?
- mysql - 是否可以绕过 MySQL Azure 的限制?
- java - 如何查看从未来对象执行的线程(名称)
- google-cloud-platform - 检查 Google Cloud API 和服务错误日志
- python - 什么 Cygwin 包允许我调试 Python 代码?
- javascript - 在没有ajax的情况下获取并发送一个Json文件(拖放)到服务器
- c++ - 使用 ReadTheOrg 将文字程序导出为 HTML 时保留源代码块的内容
- jquery - ajax调用后如何从MVC模型重新加载数据表