python - 减去具有一些匹配和不匹配的列和索引的数据框
问题描述
如何减去两个具有一些匹配和一些不匹配列和索引的数据框?
df_diff = df_add - df_subtract
df_diff = df_add.subtract(df_subtract)
在哪里:
df_add:
1 2 3 4
A 1.1 1.2 1.3 1.4
B 2.1 2.2 2.3 2.4
D 3.1 3.2 3.3 3.4
E 4.1 4.2 4.3 4.4
df_subtract:
2 4
B 5 8
C 6 9
D 7 10
试图得到:df_diff:
1 2 3 4
A 1.1 1.2 1.3 1.4
B 2.1 -2.8 2.3 -5.6
C 0 -6 0 -9
D 3.1 -3.8 3.3 -6.6
E 4.1 4.2 4.3 4.4
解决方案
尝试reindex
统一两个 DataFrame 的形状,然后进行正常减法:
# Get All Index
new_idx = df_add.index.union(df_subtract.index)
# Get All Columns
new_cols = df_add.columns.union(df_subtract.columns)
df_diff = (
df_add.reindex(index=new_idx, columns=new_cols, fill_value=0)
-
df_subtract.reindex(index=new_idx, columns=new_cols, fill_value=0)
)
重塑df_add
:
1 2 3 4
A 1.1 1.2 1.3 1.4
B 2.1 2.2 2.3 2.4
C 0.0 0.0 0.0 0.0
D 3.1 3.2 3.3 3.4
E 4.1 4.2 4.3 4.4
重塑df_subtract
:
1 2 3 4
A 0 0 0 0
B 0 5 0 8
C 0 6 0 9
D 0 7 0 10
E 0 0 0 0
df_diff
:
1 2 3 4
A 1.1 1.2 1.3 1.4
B 2.1 -2.8 2.3 -5.6
C 0.0 -6.0 0.0 -9.0
D 3.1 -3.8 3.3 -6.6
E 4.1 4.2 4.3 4.4
通过 Perfplot 的时序信息:
import numpy as np
import pandas as pd
import perfplot
np.random.seed(5)
def gen_data(n):
df_add = pd.DataFrame(np.random.random(size=(n, n)))
df_subtract = pd.DataFrame(np.random.random(size=(n, n))) \
.sample(frac=.5).sample(frac=.5, axis=1) \
.sort_index().sort_index(axis=1)
if df_subtract.empty:
return df_add, df_subtract
return (
df_add.drop(np.random.choice(df_subtract.index,
max(1, int(df_subtract.shape[0] * .2)))),
df_subtract
)
def reindex(dfs):
df_add, df_subtract = dfs
new_idx = df_add.index.union(df_subtract.index)
new_cols = df_add.columns.union(df_subtract.columns)
return (
df_add.reindex(index=new_idx, columns=new_cols, fill_value=0)
-
df_subtract.reindex(index=new_idx, columns=new_cols, fill_value=0)
)
def sub(dfs):
df_add, df_subtract = dfs
return df_add.sub(df_subtract, fill_value=0).fillna(0)
def combine_first(dfs):
df_add, df_subtract = dfs
return (df_add - df_subtract) \
.combine_first(df_add) \
.combine_first(df_subtract) \
.fillna(0)
if __name__ == '__main__':
out = perfplot.bench(
setup=gen_data,
kernels=[
sub,
reindex,
combine_first
],
labels=[
'sub @ScottBoston',
'reindex @HenryEcker',
'combine_first @DYZ'
],
n_range=[2 ** k for k in range(15)],
equality_check=None
)
out.save('perfplot_results.png', transparent=False)
推荐阅读
- node.js - 为不和谐机器人添加一个阈值到右舷
- reactjs - 从材料 ui 处理单击并更改自动完成组件
- heroku - Connecting to Heroku Postgres Database from Outside of Heroku App with JDBC
- apollo-client - 使用 useLazyQuery 钩子进行异步验证
- ios - Scene Reconstruction with ARGeoTrackingConfiguration
- aws-lambda - 如何将 AWS cloudwatch 事件添加到基于具有 terraform 的容器映像的 aws_lambda_function?
- r - Customize highlight between between plotly figure and leaflet map
- javascript - React - 使用 IF 和 Else 重构代码
- odoo - Why I'm getting error when creating a new sale order line in odoo?
- sql - SQL中如何将0添加到数字属性中,以便属性中的所有数字都是10位数字