python - 在 pandas 列中搜索字符串
问题描述
我试图在下面的 hard_skills_name 列中找到一个子字符串,就像我想要所有具有“Apple Products”作为硬技能的行一样。
我尝试了以下代码:
df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]
但收到此错误:
KeyError Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1097 raise ValueError("Cannot index with multidimensional key")
1098
-> 1099 return self._getitem_iterable(key, axis=axis)
1100
1101 # nested tuple slicing
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1035
1036 # A collection of keys
-> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
1038 return self.obj._reindex_with_indexers(
1039 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64')] are in the [index]"
解决方案
str.join()
尝试在字符串搜索之前将字符串列表链接(临时)转换为逗号分隔的字符串:
df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]
问题是由于您要搜索的字符串包含在列表中。您不能直接使用 搜索列表中的字符串.str.contains()
。为了解决这个问题,您可以先将字符串列表转换为长字符串(例如,用逗号分隔子字符串),.str.join()
然后再进行字符串搜索。
推荐阅读
- laravel - 如何解决 Laravel 中的自动注销问题
- python - 无法使用 XPATH 在 Facebook 中找到搜索栏元素
- ipython - 有人可以解释一下“!”是怎么做的吗?和“%%”命令在 python 中工作?
- swift - MCBrowserViewController 不应该在浏览器中收到此回调消息?
- docker - 不能在之前安装过的 docker 容器中使用 curl
- html - 在其父元素的中间垂直显示一个元素
- encoding - 无法将带有西里尔数据的 csv 文件上传到谷歌数据工作室 - 编码 utf-8 错误
- git - 我可以用它的前辈之一压缩合并提交吗?
- html - 为什么在内容流方面有错误的大小?
- java - 有什么方法可以从 Java 或 Kotlin 读取 Python pickle (test1.pickle)?