python - 如何获得上一年的中位数价格?
问题描述
鉴于以下数据,我如何获得上一年的中位数 squaremeterPrice?
city_code createdYear squaremeterPrice squaremeterPrice_grouped_city_for_the_current_year
0 26 2014 33273 39632.0
1 26 2014 37500 39632.0
2 26 2014 47428 39632.0
3 26 2014 39554 39632.0
4 26 2014 38893 39632.0
5 26 2013 34231 28841.0
6 26 2014 34344 39632.0
7 26 2014 44574 39632.0
8 26 2014 25202 39632.0
9 26 2014 39632 39632.0
10 26 2014 44504 39632.0
11 26 2013 23451 28841.0
...
为了得到 squaremeterPrice_grouped_city_for_the_current_year 我使用了下面的代码:
# adding the yearly average sqm price
median_squaremeterPrice_per_city = df.groupby(["city_code"])["squaremeterPrice"].median().to_frame("squaremeterPrice_grouped_city_for_the_current_year").reset_index()
df = df.merge(median_squaremeterPrice_per_city, left_on=["city_code"], right_on=["city_code"])
df
我们的预期输出如下:
city_code createdYear squaremeterPrice squaremeterPrice_grouped_city_for_the_current_year squaremeterPrice_grouped_city_for_1_year_prior
0 26 2014 33273 39632.0 28841.0
1 26 2014 37500 39632.0 28841.0
2 26 2014 47428 39632.0 28841.0
3 26 2014 39554 39632.0 28841.0
4 26 2014 38893 39632.0 28841.0
5 26 2013 34231 28841.0 whatever was the 2012 price
6 26 2014 34344 39632.0 28841.0
7 26 2014 44574 39632.0 28841.0
8 26 2014 25202 39632.0 28841.0
9 26 2014 39632 39632.0 28841.0
10 26 2014 44504 39632.0 28841.0
11 26 2013 23451 28841.0 whatever was the 2012 price
...
解决方案
相反,您的解决方案按两列分组,city_code
并createdYear
与median
前一年添加1
到年份MultiIndex
,最后DataFrame.join
用于新列:
median_squaremeterPrice_per_city_and_year = (df.groupby(["city_code", "createdYear"])["squaremeterPrice"]
.median()
.rename('squaremeterPrice_grouped_city_for_the_current_year'))
median_squaremeterPrice_per_city_and__prev_year =( median_squaremeterPrice_per_city_and_year
.rename(lambda x: x+1, level=1)
.rename('squaremeterPrice_grouped_city_for_the_prev_year'))
print (median_squaremeterPrice_per_city_and__prev_year)
df1 = (df.join(median_squaremeterPrice_per_city_and_year, on=['city_code','createdYear'])
.join(median_squaremeterPrice_per_city_and__prev_year, on=['city_code','createdYear']))
print (df1)
city_code createdYear squaremeterPrice \
0 26 2014 33273
1 26 2014 37500
2 26 2014 47428
3 26 2014 39554
4 26 2014 38893
5 26 2013 34231
6 26 2014 34344
7 26 2014 44574
8 26 2014 25202
9 26 2014 39632
10 26 2014 44504
11 26 2013 23451
squaremeterPrice_grouped_city_for_the_current_year \
0 39223.5
1 39223.5
2 39223.5
3 39223.5
4 39223.5
5 28841.0
6 39223.5
7 39223.5
8 39223.5
9 39223.5
10 39223.5
11 28841.0
squaremeterPrice_grouped_city_for_the_prev_year
0 28841.0
1 28841.0
2 28841.0
3 28841.0
4 28841.0
5 NaN
6 28841.0
7 28841.0
8 28841.0
9 28841.0
10 28841.0
11 NaN
推荐阅读
- javascript - res.send 不发送任何数据 - Express.js
- python - 如何保存使用矢量化器、管道和 GridSearchV 的 scikit-learn 分类器?
- node.js - 模块上下文中提供了如何修复索引 [0] 处的 AXIOS_INSTANCE_TOKEN
- ios - 如何使用加速度计数据 Swift 确定运动?
- c++ - 警告 C4715:并非所有控制路径都返回值 c++ - 无法通过测试
- azure-databricks - 为什么 Azure Databricks 需要将数据存储在 Azure 的临时存储中
- reactjs - 何时在 React 应用程序中的组件和页面之间使用导航与“基于状态的切换”?
- apache-kafka - 使用 Kafka-Streams 进行重复数据删除
- ramda.js - 使用从 ramda 中的原始属性派生的新属性转换对象
- reactjs - 如何从 Firebase Firestore 快照(React Hooks)中获取对象?