首页 > 解决方案 > 将 pandas 操作应用于所有字典键

问题描述

给定一个pandasDataFrame df

           paper    reference   count
9384155    p25      r50         1
7434371    p98      r9          78
7433400    p7       r27         5
7431765    p101     r91         501
7422256    p22      r5          91
...

我创建了一个字典子dfcount

df_dict={key:df[df['count']==key] for key in df['count'].unique()}

对于 中的每个子数据帧df_dict,我想应用以下操作:

df_dict[i] = df_dict[i].drop(['count'], axis=1)

pairs = df_dict[i].merge(df_dict[i], on=["reference"])
pairs = pairs[pairs["paper_x"] < pairs["paper_y"]]
pairs = pairs.groupby(["paper_x", "paper_y"]).count().reset_index()
pairs.columns = ["paper1", "paper2", "common"]

refs = df_dict[i].groupby(["paper"]).count().reset_index()
refs.columns = ["paper", "freq"]

result = pairs.merge(refs, how="left", left_on="paper1", right_on="paper")
result = result.merge(refs, how="left", left_on="paper2", right_on="paper")
result = result[["paper1", "freq_x", "paper2", "freq_y", "common"]]
result.columns = ["paper1", "freq1", "paper2", "freq2", "common"]

请注意,我创建了一个字典,以便可以将操作分别应用于具有不同count值的所有 DataFrame。

中的countdf不一定包含所有整数值,因此可能存在KeyError需要忽略的情况。我尝试运行一个for循环:

for i in range(df['count'].max()):
  try:
    c = df_dict[i]
    c = c.drop(['count'], axis=1)
    pairs = c.merge(c,on=['reference'])
    pairs = pairs[pairs["paper_x"] < pairs["paper_y"]]
    pairs = pairs.groupby(["paper_x", "paper_y"]).count().reset_index() 
    pairs.columns = ["paper1", "paper2", "common"]    
    refs = c.groupby(["paper"]).count().reset_index()
    refs.columns = ["paper", "freq"]
    result = pairs.merge(refs, how="left", left_on="paper1", right_on="paper")
    result = result.merge(refs, how="left", left_on="paper2", right_on="paper")
    result = result[["paper1", "freq_x", "paper2", "freq_y", "common"]]
    result.columns = ["paper1", "freq1", "paper2", "freq2", "common"]
  except KeyError:
    continue

但该操作返回一个大 DataFrame 并且值不正确。

理想情况下,我想要一组resultDataFrames(用count它们在中分隔df_dict)。

标签: pythonpandasdataframe

解决方案


推荐阅读