首页 > 解决方案 > Pandas lookup within groupby dataframe

问题描述

I have the below df:

data={'Name':['A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C'],
'Sales':['327','255','977','211','146','183','138','142','156','208','195','224','181','351','166','173','320','197','311','327','245','186','362','391','604','2320','2230','0','0'],
'Price':['10','11','12','13','14','15','16','17','18','30','31','32','33','34','35','36','37','38','39','40','41','60','61','62','63','64','65','66','67'],
'Second_highest_Sales':['','255','327','327','327','327','327','327','327','','195','208','208','224','224','224','320','320','320','327','327','','186','362','391','604','2230','2230','2230']}

data=pd.DataFrame(data)

I am looking to get the corresponding 'price' for 'Second_highest_Sales' based on 'Sales' for each group (Name). The result would look like:

result={
'Name':['A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C'],
'Sales':['327','255','977','211','146','183','138','142','156','208','195','224','181','351','166','173','320','197','311','327','245','186','362','391','604','2320','2230','0','0'],
'Price':['10','11','12','13','14','15','16','17','18','30','31','32','33','34','35','36','37','38','39','40','41','60','61','62','63','64','65','66','67'],
'Second_highest_Sales':['','255','327','327','327','327','327','327','327','','195','208','208','224','224','224','320','320','320','327','327','','186','362','391','604','2230','2230','2230'],
'2nd_Highest_Price':['','11','10','10','10','10','10','10','10','','31','30','30','32','32','32','37','37','37','40','40','','60','61','62','63','65','65','65']}

result=pd.DataFrame(result)

I tried with .shift() and .lookup() but get the index error on a groupby dataframe. Is there an easier way to do this instead of a custom function?

标签: pythonpandaspandas-groupbyrolling-computation

解决方案


我会

  • 获取“Second_highest_Sales”系列删除空值
  • 检索相应的名称
  • 按名称和销售额重新索引 DataFrame
  • 搜索相应名称和 Second_highest_Sales 的价格
  • 用定义 Second_highest_Sales 的所需值填充列

在代码上会是这样的

shs = data['Second_highest_Sales']
shs = shs[shs!='']
shs_names = data.iloc[shs.index]['Name']
prices = data.set_index(['Name','Sales']).loc[zip(shs_names, shs)]['Price']
result = data.copy()
result ['Second_highest_Price']=''
result.loc[shs.index,'Second_highest_Price'] = prices.values

推荐阅读