首页 > 解决方案 > 为什么在熊猫数据框上使用 .apply 时会给出不正确的结果?我的循环版本有效

问题描述

我有两个熊猫数据框:

  1. df_topics_temp contains带有列的矩阵id
  2. df_mapping containsid到a 的映射parentID

我正在尝试用parent.idin填充列。df_topics_tempparentIDdf_mapping

我已经使用循环编写了一个解决方案,尽管它非常麻烦。有用。我使用熊猫.apply的解决方案df_topics_temp不起作用

解决方案1(有效):


def isnan(value):
  try:
      import math
      return math.isnan(float(value))
  except:
      return False

for x in range(0, df_topics_temp['id'].count()):
    topic_id_loop = df_topics_temp['topic.id'].iloc[x]
    mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
    parent_id = mapping_row['parentId'].iloc[0]
    
    if isnan(parent_id):
        df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
    else:     
        df_topics_temp['parent.id'].iloc[x] = topic_id_loop

解决方案2(不起作用):


def map_function(x):
        df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
        temp = df_topics_temp['parentId'].iloc[0]
        return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head() 

第二个解决方案(pandas ).apply没有填充.parent.iddf_topics_temp

感谢您的帮助

更新 1

<ipython-input-68-a2e8d9a21c26> in map_function(row)
      1 def map_function(row):
----> 2         row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
      3         return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')

标签: pythonpandasdataframe

解决方案


如果我理解正确,“应用”会占用一行并返回一行。因此,您希望您的函数返回一行。你的返回一个值。例如:

#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict({'name':['alice','bob'],'id':[1,2]})
mapping = pd.DataFrame.from_dict({'id':[1,2,3,4],'parent_id':[100,200,100,200]})

#mapping function
def f(row):
    if any(mapping['id']==row['id']):
        row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
    else: # missing value
        row['parent_id'] = np.nan
    return row

df1.apply(f,axis=1)

推荐阅读