python - Create Matrix (as in 2 way table) from 3-column pandas DataFrame
问题描述
I have a dataframe like this,
datetime id value
0 2021-02-21 15:43:00 154 0.102677
1 2021-02-21 15:57:00 215 0.843945
2 2021-02-21 00:31:00 126 0.402851
3 2021-02-21 16:38:00 61 0.138945
4 2021-02-21 05:11:00 124 0.865435
.. ... ... ...
115 2021-02-21 21:54:00 166 0.108299
116 2021-02-21 17:39:00 192 0.129267
117 2021-02-21 01:56:00 258 0.300448
118 2021-02-21 20:35:00 401 0.119043
119 2021-02-21 09:16:00 192 0.587173
which I can create by issuing,
import datetime
from numpy import random
#all minutes of the day, ordered, unique
d = pd.date_range("2021-02-21 00:00:00","2021-02-21 23:59:59", freq="1min")
d2 = pd.Series(d).sample(120,replace=True)
ids = random.randint(1,500,size=d2.shape[0])
df = pd.DataFrame({'datetime':d2,'id':ids,'value':random.random(size=d2.shape[0])})
df.reset_index(inplace=True,drop=True)
and I want to have it in a matrix with one index being the minute of the day and the other one being the id,
so that I would have 1440*unique(ids).shape[0]
Please, note that, even if some minutes do not appear in the dataframe, the output matrix is 1440 anyways.
I can do it like this,
but this takes VERY long time. How can I better do it?
#all ids, unique
uniqueIds = df.id.unique()
idsN = ids.shape[0]
objectiveMatrix = np.zeros([1440,idsN])
mins = pd.date_range(start='2020-09-22 00:00', end='2020-09-23 00:00', closed=None, freq='1min')
for index, row in df.iterrows():
a = np.where(row.id==uniqueIds)[0]
b = np.where(row.datetime==d)[0]
objectiveMatrix[b,a] = row.value
解决方案
This is so-called pivot. Pandas has pivot
, pivot_table
, set_index/unstack
for this. For more details, see this excellent guide. As a starter, you can try:
# this extract the time string
df['minute'] = df['datetime'].dt.strftime('%H-%M')
output = df.pivot_table(index='minute', columns='id', values='value')
推荐阅读
- c++ - 如何公开匿名工会的单个成员?
- ruby-on-rails - 尽管控制台显示参数正确传递并且控制器正确调用,Rails 参数传递问题
- javascript - 在Angular中将参数传递给css类
- python - 销售税和账单(第 2 部分)(python3)
- git - git log 包含另一个分支的提交,如何在分支之间分离提交
- r - 使用 R 中的 gt 包更改存根行组中的缩进
- rest - Google Drive API Resumable Upload PUT 请求失败。状态码:400。消息:未找到
- css - 纵向查看时,英雄图像不是全高
- android - 设置中的字体大小使 TextView 中的文本不可读且重叠
- docker - Gcloud 和 docker 混淆