首页 > 解决方案 > is there a more efficient way to aggregate a dataset and calculate frequency in Python or R?

问题描述

i have a dataset [0, 1, 1, 2], I want to aggregate it. to do this, I have to compute and put the 'frequency':1/4 manually into a DataFrame. here is the code.

>>> df = pd.DataFrame({'value':[0, 1, 1, 2],
...             'frequency':1/4})
>>> df.groupby('value').sum()
       frequency
value           
0           0.25
1           0.50
2           0.25

is there a more efficient way to aggregate the dataset and calculate the frequency automatically in Python or R?

标签: pythonrpandas

解决方案


在 R 中

prop.table(table(dat$value))

   0    1    2 
0.25 0.50 0.25 

在 python 中,NumPy

import numpy as np 
u,c=np.unique(df.value,return_counts=True)
pd.Series(c/c.sum(),index=u)
0    0.25
1    0.50
2    0.25
dtype: float64

推荐阅读