首页 > 解决方案 > 在熊猫中将月和年转换为天

问题描述

我有几个月和几年的数据框,我想将其转换为天

 Name     details
    
    prem     6 months probation included
    
    shaves    3 years 6 months  suspended
    
    geroge    48 hours work time
    
    julvie    4 years 20 days terms included 
    
   tiz        80 days work
   lamp       44 days work

这里我想改变3 years as 1095 days, 6 months as 186 days,闰年也可以包括在内,我想删除所有其他词,例如probation included, suspended,我想在新列中获得所有结果。

预期结果:

 Name     details                            Time
    
    prem     6 months probation included     186 days
    
    shaves    3 years 6 months suspended              1181 days
    
    geroge    48 hours work time             48 hours
    
    julvie    4 years 20 days terms included         1480 days
   tiz        80 days  work                      80 days
  lamp       44 days   work                      44 days

标签: pythonregexpandasdata-cleaning

解决方案


用于Series.str.extract获取数字中的年份和 monts,然后按标量倍数,因为未指定开始日期(应该更精确,例如 for year=365.2564daysSeries.map,最后按条件添加单位numpy.where

d = {'months': 31, 'years':365, 'hours':1, 'days':1}
df1 = df['details'].str.extract('(\d+)\s+(years|months|hours|days)', expand=True)
df['Time'] = df1[0].astype(float).mul(df1[1].map(d)).astype('Int64').astype(str)

df['Unit'] = np.where(df1[1].isin(['years','months', 'days']), ' days', ' ' + df1[1])

df['Time'] += df.pop('Unit')  
print (df)
     Name                      details       Time
0    prem  6 months probation included   186 days
1  shaves            3 years suspended  1095 days
2  geroge           48 hours work time   48 hours
3  julvie       4 years terms included  1460 days
4     tiz                 80 days work    80 days
5    lamp                 44 days work    44 days    

编辑:如果可能的话,您可以使用多个单位:

#specified dictionary for extract to days
d = {'months': 31, 'years':365, 'days':1}

#extract anf multiple by dictionary
out = {k: df['details'].str.extract(rf'(\d+)\s+{k}', expand=False).astype(float).mul(d[k])
          for k, v in d.items()}
#join together, sum and convert to days with replace 0 days 
days = pd.concat(out, axis=1).sum(axis=1).astype(int).astype('str').add(' days').replace('0 days','')

#extract hours
hours = df['details'].str.extract(r'(\d+\s+hours)', expand=False).radd(' ').fillna('')

#join together
df['Time'] = days + hours
print (df)
     Name                                      details               Time
0    john  2 years 1 months 10 days 15 hours work time  771 days 15 hours
1    prem                  6 months probation included           186 days
2  shaves                  3 years 6 months  suspended          1281 days
3  geroge                           48 hours work time           48 hours
4  julvie               4 years 20 days terms included          1480 days
5     tiz                                 80 days work            80 days
6    lamp                                 44 days work            44 days
    

推荐阅读