首页 > 解决方案 > 为什么我通过在我的代码中以不同的顺序放置“年”和“年”得到不同的输出

问题描述

我所做的只是将“年”和“年”的位置从第一行切换到第二行,反之亦然。

这是原始列

10+ years    653
< 1 year     249
2 years      243
3 years      235
5 years      202
4 years      191
1 year       177
6 years      163
7 years      127
8 years      108
9 years       72
.              2
Name: Employment.Length, dtype: int64

第一个例子(第一行的“年”,第二行的“年”)

raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('years',' ')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('year',' ')
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[:2]=='10',10,raw_data['Employment.Length'])
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[0]=='<',0,raw_data['Employment.Length'])
raw_data['Employment.Length'] = pd.to_numeric(raw_data['Employment.Length'], errors = 'coerce')

输出

10.0    653
0.0     249
2.0     243
3.0     235
5.0     202
4.0     191
1.0     177
6.0     163
7.0     127
8.0     108
9.0      72
Name: Employment.Length, dtype: int64

第二个例子(第一行的'year',第二行的'years')

raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('year',' ')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')

输出

10.0    653
0.0     249
1.0     177
Name: Employment.Length, dtype: int64

还有一件事是,当我用'year'注释掉我的第二行时,它给我的输出与第一个示例相同。当我用'years'注释掉我的第二行时,它给我的输出与第二个示例相同。

第三个例子

 raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
    #raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
    raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
    raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
    raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')

输出

10.0    653
0.0     249
2.0     243
3.0     235
5.0     202
4.0     191
6.0     163
7.0     127
8.0     108
9.0      72
Name: Employment.Length, dtype: int64

标签: python-3.xpandasdata-preprocessing

解决方案


如果您首先替换'year'' 'then ,则'years'不再替换为您的后续 .' s''s'str.replace('years', ' ')

而不是多个后续替换使用一个可选的s'year[s]?'

import pandas as pd
s = pd.Series(['year', 'years', 'foo'])

s.str.replace('year[s]?', ' ')
#0       
#1       
#2    foo
#dtype: object

推荐阅读