首页 > 解决方案 > Creating user defined function for joins (Python)

问题描述

I am looking for an easy way to define a function that will consecutively join tables when ran. I am pretty new to Python, but have been given the task of building out a package that heavily relies on joins to work successfully.

I have done plenty of work in R, but will be finishing this in Python (unless I just hit a wall). The goal is to automate a complete task to where a dataframe could be inserted, pushed through a function, and then a presented in a couple different views. This would require one function for each view. Because of this, there are a

This is horrible, and as I am familiar with dplyr, I'm trying to use dfply to accomplish this.

def get_hcc(df, df2, df3):
    df = (df >> inner_join(df2, by=[('col1', 'col2'), ('col1', 'col3')]))
    df = df.drop_duplicates()
    df = (df3 >> left_join(df, by = 'col4'))
    return df

If anyone has better ideas as to how to go about this, that would be greatly appreciated!

Thanks.

标签: pythonpandasdfply

解决方案


推荐阅读