python - is there a more efficient way to aggregate a dataset and calculate frequency in Python or R?
问题描述
i have a dataset [0, 1, 1, 2], I want to aggregate it. to do this, I have to compute and put the 'frequency':1/4 manually into a DataFrame. here is the code.
>>> df = pd.DataFrame({'value':[0, 1, 1, 2],
... 'frequency':1/4})
>>> df.groupby('value').sum()
frequency
value
0 0.25
1 0.50
2 0.25
is there a more efficient way to aggregate the dataset and calculate the frequency automatically in Python or R?
解决方案
在 R 中
prop.table(table(dat$value))
0 1 2
0.25 0.50 0.25
在 python 中,NumPy
import numpy as np
u,c=np.unique(df.value,return_counts=True)
pd.Series(c/c.sum(),index=u)
0 0.25
1 0.50
2 0.25
dtype: float64