首页 > 解决方案 > str.contains 和 str.find 的结果不同

问题描述

在我看来,两者都应该给出相同的答案:

train = pd.read_csv('https://raw.github.com/mattdelhey/kaggle-titanic/master/Data/train.csv')
train.name.str.contains('Mr.').sum()
(train.name.str.find('Mr.')>0).sum()

但输出是:

647
517

不同结果背后的原因是什么?

标签: pandas

解决方案


区别str.contains也是 match Mrs.,因为.是特殊的正则表达式字符(它用于匹配任何字符)。

我认为需要转义它或添加参数regex=False

print(train.name.str.contains('Mr\.').sum())
517
print(train.name.str.contains('Mr.', regex=False).sum())
517
print((train.name.str.find('Mr.')>0).sum())
517

测试差异:

a = train.loc[train.name.str.contains('Mr.'), 'name']
b = train.loc[(train.name.str.find('Mr.')>0), 'name']


c = pd.concat([a, b], axis=1, keys=('contains','find'))
c = c[c.isnull().any(axis=1)]
print (c)
                                              contains find
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  NaN
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  NaN
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  NaN
9                  Nasser, Mrs. Nicholas (Adele Achem)  NaN
15                    Hewlett, Mrs. (Mary D Kingcome)   NaN
18   Vander Planke, Mrs. Julius (Emelia Maria Vande...  NaN
19                             Masselmani, Mrs. Fatima  NaN
25   Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...  NaN
31      Spencer, Mrs. William Augustus (Marie Eugenie)  NaN
40      Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  NaN
41   Turpin, Mrs. William John Robert (Dorothy Ann ...  NaN
49       Arnold-Franchi, Mrs. Josef (Josefine Franchi)  NaN
52            Harper, Mrs. Henry Sleeper (Myna Haxtun)  NaN
53   Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin...  NaN
66                        Nye, Mrs. (Elizabeth Ramell)  NaN
85   Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...  NaN
...
...

推荐阅读