python - 从右列中删除 NaN 值,同时在左列中保留值
问题描述
我将三个数据框合并在一起,然后从中删除重复项。但是,当我从最后三列中删除重复项时,我会在要删除的数据框顶部获得 NaN 值,但似乎找不到这样做的方法。
到目前为止,这是我的代码:
bDF=pd.read_csv(bRaw)
pDF=pd.read_csv(pRaw)
mDF=pd.read_csv(mRaw)
del bRaw,pRaw,mRaw
#Merge Together Datarames on the Value Role Name
dfs=[bDF,pDF,mDF]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['R1'],
how='outer'), dfs)
del bDF,pDF,mDF,dfs
#Rearrange Columns
cols=df_merged.columns.tolist()
cols=cols[0:1]+cols[-3:]+cols[1:5]
df_merged=df_merged[cols]
合并后的输出:
+------+-----+------+----+--------+--------+--------+--------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+--------+--------+--------+--------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 301 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 301 | 3915 | R6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3302 | I6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 305 | 3915 | R6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 301 | 3302 | I6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 301 | 3915 | R6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 305 | 3302 | I6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 305 | 3915 | R6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 301 | 3302 | I6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 301 | 3915 | R6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 305 | 3302 | I6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 305 | 3915 | R6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 301 | 3302 | I6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 301 | 3915 | R6 | Cofow | Value4 | Value4 | Value4 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value9 | Value9 | Value9 |
+------+-----+------+----+--------+--------+--------+--------+
然后我从前 4 列中删除重复项,然后是最后三列,最后是中间列:
#Remove Duplicate Values
df_merged[cols[0:-3]]=df_merged[cols[0:-3]].mask(df_merged[cols[:-3]].duplicated())
df_merged[cols[-3:]]=df_merged[cols[-3:]].mask(df_merged[cols[-3:]].duplicated())
df_merged[cols[4:5]]=df_merged[cols[4:5]].mask(df_merged[cols[4:5]].duplicated())
df_merged=df_merged.dropna(how='all')
我的输出接近最终形式所需的:
+------+-----+------+----+-------+---------+---------+---------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+-------+---------+---------+---------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | | NaN | NaN | NaN |
| JMAC | 301 | 3302 | I6 | | NaN | NaN | NaN |
| JMAC | 301 | 3915 | R6 | | NaN | NaN | NaN |
| | | | | | Value2 | Value2 | Value2 |
| | | | | | Value3 | Value3 | Value3 |
| | | | | | Value4 | Value4 | Value4 |
| | | | | | Value6 | Value6 | Value6 |
| | | | | | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofow | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | | NaN | NaN | NaN |
| JMAP | 305 | 3314 | I6 | | NaN | NaN | NaN |
| JMAP | 305 | 3315 | R6 | | NaN | NaN | NaN |
| JMAP | 305 | 3916 | R6 | | NaN | NaN | NaN |
| | | | | | Value9 | Value9 | Value9 |
| | | | | | Value10 | Value10 | Value10 |
| | | | | | Value11 | Value11 | Value11 |
| | | | | | Value12 | Value12 | Value12 |
| | | | | | Value13 | Value13 | Value13 |
+------+-----+------+----+-------+---------+---------+---------+
我的问题是我想摆脱我的 NaN 值并将值向上移动。所以我希望我的最终结果看起来像:
+------+-----+------+----+-------+---------+---------+---------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+-------+---------+---------+---------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | | Value2 | Value2 | Value2 |
| JMAC | 301 | 3302 | I6 | | Value3 | Value3 | Value3 |
| JMAC | 301 | 3915 | R6 | | Value4 | Value4 | Value4 |
| | | | | | Value6 | Value6 | Value6 |
| | | | | | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofow | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | | Value9 | Value9 | Value9 |
| JMAP | 305 | 3314 | I6 | | Value10 | Value10 | Value10 |
| JMAP | 305 | 3315 | R6 | | Value11 | Value11 | Value11 |
| JMAP | 305 | 3916 | R6 | | Value12 | Value12 | Value12 |
| | | | | | Value13 | Value13 | Value13 |
+------+-----+------+----+-------+---------+---------+---------+
我尝试将这些列分成两个不同的数据框,删除 NA,然后将它们组合起来,但是我的数据由于索引而被丢弃。
df3=pd.concat([df2,df1], axis=1, ignore_index=False)
任何帮助或想法都会很棒!
非常感谢,
要旨
解决方案
然后我从前 4 列中删除重复项,然后是最后三列,最后是中间列:
假设这些是您想要执行的步骤,请尝试drop_duplicates
. 这是一个示例,它将在一个命令中按您的顺序执行此操作:
df = df.drop_duplicates(
subset=['col1', 'col2', 'col3', 'col4']).drop_duplicates(
subset=['col6', 'col7', 'col8']).drop_duplicates(
subset=['col5'])
您还可以使用keep
参数(例如keep='first'
vs keep='last'
)来更改要删除/保留的行。
推荐阅读
- mysql - 如何在 C# 应用程序中使用 MySQL 连接器/NET
- mysql - 如何在 Google Cloud Platform 中使用多个 MySQL 主服务器?
- javascript - 将二维数组从一个 JavaScript 函数传递到另一个
- node.js - 在单存储库中使用 Lerna 对多个 Node 应用程序进行 Dockerizing
- r - 忽略特定警告,但不忽略 R 中的其他警告
- c++ - 在 C++ 类中初始化静态值彼此相等
- sharepoint-online - 我们可以使用逻辑应用来获取 Sharepoint 在线网站上的点击次数吗
- qt - Qt Creator:没有可用的文档
- asp.net-core - 异步等待返回任务 Null .Net Core
- javascript - 循环遍历 v-for 中的数组子集