python - Pandas Dataframe, adding years and calculations for future years from another dataframe
问题描述
here is a sample piece of the current dataframe, it is the first day and all 24 hours. The whole dataframe is a year broken down into 24 hour segments
+-------+-----+------+---------------+-------------------+
| month | day | hour | project_name | hourly_production |
+-------+-----+------+---------------+-------------------+
| 1 | 1 | 1 | Blah | 0 |
| 1 | 1 | 2 | Blah | 0 |
| 1 | 1 | 3 | Blah | 0 |
| 1 | 1 | 4 | Blah | 0 |
| 1 | 1 | 5 | Blah | 0 |
| 1 | 1 | 6 | Blah | 0 |
| 1 | 1 | 7 | Blah | 0 |
| 1 | 1 | 8 | Blah | 1.44 |
| 1 | 1 | 9 | Blah | 40.42 |
| 1 | 1 | 10 | Blah | 49.13 |
| 1 | 1 | 11 | Blah | 47.57 |
| 1 | 1 | 12 | Blah | 43.77 |
| 1 | 1 | 13 | Blah | 42.33 |
| 1 | 1 | 14 | Blah | 45.25 |
| 1 | 1 | 15 | Blah | 48.54 |
| 1 | 1 | 16 | Blah | 46.34 |
| 1 | 1 | 17 | Blah | 18.35 |
| 1 | 1 | 18 | Blah | 0 |
| 1 | 1 | 19 | Blah | 0 |
| 1 | 1 | 20 | Blah | 0 |
| 1 | 1 | 21 | Blah | 0 |
| 1 | 1 | 22 | Blah | 0 |
| 1 | 1 | 23 | Blah | 0 |
| 1 | 1 | 24 | Blah | 0 |
+-------+-----+------+---------------+-------------------+
Here is my current code:
df0_partition_1 = df0[['project_id', 'start_date', 'degradation_factor', 'snapshot_datetime']]
df0_partition_2 = df0_partition_1.groupby(['project_id', 'start_date', 'degradation_factor_solar', 'snapshot_datetime']).size().reset_index()
df2_partition_1 = df2[df2['duration_year']==df2['duration_year'].max()]
df2_partition_2 = df2_partition_1.groupby(['project_id', 'snapshot_datetime']).size().reset_index()
df_merge = pd.merge(df0_partition_2, df2_partition_2, on=['project_id', 'snapshot_datetime'], how='left')
df_merge.rename(columns={'0_y':'duration_year'}, inplace=True)
df_parts = df_merge[['project_id', 'start_date', 'duration_year', 'degradation_factor_solar', 'snapshot_datetime']].dropna()
for index, row in df_parts.iterrows():
df1_filtered = df1[(df1['project_id'] == row['project_id']) &
(df1['snapshot_datetime'] == row['snapshot_datetime'])]
df1_filtered['year'] = pd.to_datetime(row['start_date']).year
for y in range(1, int(row['duration_year'])+1):
df_stg = df_stg = df1_filtered[[df1_filtered['year'] + y, df1_filtered['hourly_production']*(1-(float(row.loc['degradation_factor_solar'].strip('%'))*y/100))]]
df_final = df1_filtered.append(df_stg)
I need help figuring out how to create the final dataframe. The final dataframe is an append of future years with the degredation factor applied to the hourly production. I am not sure how to increment the year in the DF and apply the degradation factor and then append.
right now this gives me TypeError: 'Series' objects are mutable, thus they cannot be hashed
解决方案
原来我需要做一个 df.copy 以停止弄乱我的原始数据框,从而有一个有效的附加。
df0_partition_1 = df0[['project_id', 'start_date', 'degradation_factor_solar', 'snapshot_datetime']]
df0_partition_2 = df0_partition_1.groupby(['project_id', 'start_date', 'degradation_factor', 'snapshot_datetime']).size().reset_index()
df2_partition_1 = df2[df2['duration_year']==df2['duration_year'].max()]
df2_partition_2 = df2_partition_1.groupby(['project_id', 'snapshot_datetime']).size().reset_index()
df_merge = pd.merge(df0_partition_2, df2_partition_2, on=['project_id', 'snapshot_datetime'], how='left')
df_merge.rename(columns={'0_y':'duration_year'}, inplace=True)
df_parts = df_merge[['project_id', 'start_date', 'duration_year', 'degradation_factor', 'snapshot_datetime']].dropna()
for index, row in df_parts.iterrows():
df1_filtered = df1[(df1['project_id'] == row['project_id']) &
(df1['snapshot_datetime'] == row['snapshot_datetime'])]
df1_filtered['year'] = pd.to_datetime(row['start_date']).year
df1_filtered.reset_index(inplace=True, drop=True)
df1_filtered.drop(columns='project_name', inplace=True)
df_stg_1 = df1_filtered.copy()
for y in range(2, int(row['duration_year'])+1):
year = df1_filtered['year']+(y-1)
hourly_production = df1_filtered['hourly_production']
df_stg_1['year'] = year
df_stg_1['hourly_production'] = hourly_production*(1-(float(row.loc['degradation_factor_solar'].strip('%'))*(y-1)/100))
df_stg_2 = df1_filtered.append(df_stg_1)
df_final = df1_filtered.append(df_stg_2)
df_final.reset_index(inplace=True, drop=True)
推荐阅读
- keras - 如何构建每个二元分类器都是 Keras 神经网络的分类器链?
- pyspark - geolocator.reverse() TypeError: reverse() 接受 2 个位置参数,但 3 个在 pyspark 中给出
- python - 如何使用 Python read() 一次读取大文件
- javascript - EventListener beforeunload 和 JavaScript 全局变量
- amazon-web-services - Redshift 更改查询
- c# - C# Google Cloud Text-To-Speech Being - Wavenet 问题
- python - vtk 中是否有可用的颜色图预设?
- oracle - 从未格式化的文本文件中提取语句
- c# - 手机上的应用程序显示的内容与 Unity 和 Nox 中相同的代码不同
- javascript - 有没有办法为Javascript中的条件保留线程。?