首页 > 解决方案 > 当 groupie 后跟 value_counts() 时如何计算 pandas.Series 范围

问题描述

我有这样的数据:

year = ['2010', '2011-2014', '2013', '2012-2016', '2018-present', '2019', '2015-present', '2015']
products = ['A', 'B', 'C', 'D', 'B', 'E', 'F', 'A']
rating = [4, 2, 2, 3, 1, 1, 2, 2]

data = pd.DataFrame({'Products': products, 'Year': year, 'Rating': rating})

在我的分析中,我想将年份范围转换为单年值(例如,['2010', '2011', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']),并为其他列添加年份范围的计数。例如对于上面的例子,我想要: {'2010': 'A', '2011': 'B', '2013': 'B', '2014': 'B', '2013': 'c ','2012':'D','2013':'D','2014':'D','2015':'D','2016':'D',...}

我相信我需要它与pandas.cut分箱相反,但我不知道如何在熊猫中做到这一点

标签: pythonpandasgroup-by

解决方案


使用explode

# Extract the range information from the Year column
y = data['Year'].str.extract('(?P<From>\d+)-?(?P<To>\d+|present)?')
y['To'] = y['To'].combine_first(y['From']).replace({'present': '2020'})
y = y.astype('int')
y['Range'] = y.apply(lambda row: range(row['From'], row['To']+1), axis=1)

# The explosion
data['Range'] = y['Range']
data = data.explode('Range')

结果:

Products          Year  Rating Range
       A          2010       4  2010
       B     2011-2014       2  2011
       B     2011-2014       2  2012
       B     2011-2014       2  2013
       B     2011-2014       2  2014
       C          2013       2  2013
       D     2012-2016       3  2012
       D     2012-2016       3  2013
       D     2012-2016       3  2014
       D     2012-2016       3  2015
       D     2012-2016       3  2016
       B  2018-present       1  2018
       B  2018-present       1  2019
       B  2018-present       1  2020
       E          2019       1  2019
       F  2015-present       2  2015
       F  2015-present       2  2016
       F  2015-present       2  2017
       F  2015-present       2  2018
       F  2015-present       2  2019
       F  2015-present       2  2020
       A          2015       2  2015

根据需要重命名列


推荐阅读