首页 > 解决方案 > using melt function in groupby for large data sets in python

问题描述

I have one data frame with 1782568 distinct groups.

So, when i melt that data by grouping level my kernal got stuck.

So, I am decided to to melt the data by group wise and then i will combine all of them sequentially.

For that I wrote the following function.

def split(df,key):
    df2=pd.DataFrame() 
    for i in range(df[key].drop_duplicates().shape[0]):
        grp_key=tuple(df[key].drop_duplicates().iloc[i,:])
        df1=df.groupby(key,as_index=False).
            get_group(grp_key).reset_index().drop('index',axis=1)
        df2=df2.append(df1.groupby(key,as_index=False).
            apply(pd.melt,id_vars=key).reset_index()).dropna()
        df2=df2.drop(grep('level',df2.columns),axis=1)
    return(df2)

here grep is my user defined function, it is working as grep function in R.

In df i would pass data frame and in key i would pass grouping keys in list format.

But the function also took very huge time to complete the process.

Can any one help me to improve the performance.

Thanks in Advance.

标签: pythonpandas-groupbymelt

解决方案


推荐阅读