首页 > 解决方案 > 在 pandas 列中搜索字符串

问题描述

我试图在下面的 hard_skills_name 列中找到一个子字符串,就像我想要所有具有“Apple Products”作为硬技能的行一样。

在此处输入图像描述

我尝试了以下代码:

df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

但收到此错误:

KeyError                                  Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1097                     raise ValueError("Cannot index with multidimensional key")
   1098 
-> 1099                 return self._getitem_iterable(key, axis=axis)
   1100 
   1101             # nested tuple slicing

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1035 
   1036         # A collection of keys
-> 1037         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038         return self.obj._reindex_with_indexers(
   1039             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64')] are in the [index]"

标签: pythonpandasstringdataframedata-wrangling

解决方案


str.join()尝试在字符串搜索之前将字符串列表链接(临时)转换为逗号分隔的字符串:

df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]

问题是由于您要搜索的字符串包含在列表中。您不能直接使用 搜索列表中的字符串.str.contains()。为了解决这个问题,您可以先将字符串列表转换为长字符串(例如,用逗号分隔子字符串),.str.join()然后再进行字符串搜索。


推荐阅读