python - 比较多个 Pandas 数据框的所有列名
问题描述
我正在寻找一种更简单的方法来比较具有与下面代码类似的输出的多个列名,或者寻找一种使此函数更简洁的方法。每次我尝试使用 *args 或 **Kwargs 来简化函数时,都会收到错误“IndexError: list index out of range”或“函数需要 0 个位置参数但给出了 1”。任何意见是极大的赞赏。
'''
application_df = pd.read_csv("application_train.csv")
bureau_df = pd.read_csv("bureau.csv")
bureau_balance_df = pd.read_csv("bureau_balance.csv")
previous_application_df = pd.read_csv("previous_application.csv")
POS_CASH_balance_df = pd.read_csv("POS_CASH_balance.csv")
installments_payments_df = pd.read_csv("installments_payments.csv")
credit_card_balance_df = pd.read_csv("credit_card_balance.csv")
sample__submission_df = pd.read_csv("sample_submission.csv")
def column_compare(df1, df2, df3, df4, df5, df6, df7, df8):
alist = []
blist = []
clist = []
dlist = []
elist = []
flist = []
glist = []
hlist = []
name1 =[x for x in globals() if globals()[x] is df1][0]
for a in df1:
alist.append(a)
name2 =[x for x in globals() if globals()[x] is df2][0]
for b in df2:
blist.append(b)
name3 =[x for x in globals() if globals()[x] is df3][0]
for c in df3:
clist.append(c)
name4 =[x for x in globals() if globals()[x] is df4][0]
for d in df4:
dlist.append(d)
name5 =[x for x in globals() if globals()[x] is df5][0]
for e in df5:
elist.append(e)
name6 =[x for x in globals() if globals()[x] is df6][0]
for f in df6:
flist.append(f)
name7 =[x for x in globals() if globals()[x] is df7][0]
for g in df7:
glist.append(g)
name8 =[x for x in globals() if globals()[x] is df8][0]
for h in df8:
hlist.append(h)
dfs = {name1: alist, name2: blist, name3: clist, name4: dlist, name5: elist, name6: flist, name7: glist, name8: hlist}
df = pd.DataFrame.from_dict(dfs, orient='index')
df=df.transpose().replace(np.nan,'')
return df
pd.set_option("max_rows", None)
column_names = column_compare(application_df, bureau_df, bureau_balance_df, previous_application_df, POS_CASH_balance_df, installments_payments_df, credit_card_balance_df, sample__submission_df)
column_names
'''
解决方案
不确定这是否是您感兴趣的内容,但可以通过 访问列名df.columns
,因此:
import pandas as pd
df1 = pd.DataFrame(columns=list('abc'))
df2 = pd.DataFrame(columns=list('abefg'))
df3 = pd.DataFrame(columns=list('bfkm'))
cols1 = pd.DataFrame(index=df1.columns).assign(df1=1).T
cols2 = pd.DataFrame(index=df2.columns).assign(df2=1).T
cols3 = pd.DataFrame(index=df3.columns).assign(df3=1).T
df_cols_compare = pd.concat([cols1, cols2, cols3]).fillna(0).astype('int')
# a b c e f g k m
# df1 1 1 1 0 0 0 0 0
# df2 1 1 0 1 1 1 0 0
# df3 0 1 0 0 1 0 1 1
推荐阅读
- r - 先知日期格式 R
- javascript - 单击签入反应时,将行数据存储为状态中的对象数组
- java - 年复利、半年复利、季复利、月复利、日复利如何计算?
- python - 使用 Numpy 创建类似字典的 Hashmap 结构
- python - How do I make python take multiple horizontal lines of numerical input and output the same lines?
- javascript - Discord.js 如何在机器人打开后将消息发送到频道
- swiftui - SwiftUI 2.0 中的 navigationBarTitle 和 navigationTitle 修饰符有什么区别?
- php - Laravel 护照:密钥路径“file:///tmp/SomeFileName.key”不存在或不可读
- r - r 指数分布 rv 卷积的密度函数有时会产生错误值
- python - Python Scikit 学习 OneHotEncoder 仅对选择值进行编码