首页 > 解决方案 > Difference in SparkSQL Dataframe columns

问题描述

How do I locate difference between 2 dataframe columns ? This is causing issues when I join 2 dataframes.

df1_cols = df1.columns df2_cols = df2.columns This will return columns for 2 dataframe in 2 list variables.

Thanks

标签: pysparkpyspark-sql

解决方案


df.columns在此处返回一个列表,因此您可以使用 python 中的任何工具与另一个列表进行比较,即df2_cols. 例如,您可以使用set检查两个 DataFrame 中的公共列

df1_cols = df1.columns
df2_cols = df2.columns
set(df1_cols).intersection(set(df2_cols))  # check common columns
set(df1_cols) - set(df2_cols) # check columns in df1 but not in df2
set(df2_cols) - set(df1_cols) # check columns in df2 but not in df1

推荐阅读