首页 > 解决方案 > 根据值的类型过滤 Pandas Dataframe 中的数据

问题描述

我几乎没有尝试使用 .loc 函数过滤我的数据框集,其条件基于我的一列中的数据类型。

我的目标是仅在具有特定类型的行上应用(使用.apply)列上的函数。

我尝试使用“dtype”,但我的列有 2 种不同类型的值。所以我只得到“对象”。

所以,当我这样做时:print(df.info(verbose=True))我明白了:

 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   address              26419 non-null  object
.
.
.

这是我要运行的内容:

import ipaddress as ipa
.
.
.
    df.loc['EXCEPTION'] = df.loc[isinstance(df['address'], ipa.IPv4Network)].apply(
        return_row_with_exception,
        axis=1)

它应该仅更新数据帧“df”上的“例外”列,仅更新“地址”列中的数据为 IPv4Network 类型的行。函数 'return_row_with_exception' 根据使用该行其他列的规则,为每一行返回“EXCEPTION”的字符串内容。

不幸的是,我收到了这个错误,有人可以帮我解决这个问题吗:D

Traceback (most recent call last):
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 93, in pandas._libs.index.Int64Engine._check_type
KeyError: False

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pythonProject1111\main.py", line 14, in <module>
    abc = lib_read_from_imap.process_abc(abc)
  File "pythonProject1111\libs\read_from_abc.py", line 178, in process_abc
    df_file_abc = scaexc.fill_scan_exception(df_file_abc)
  File "pythonProject1111\libs\process_scan_exception.py", line 80, in fill_scan_exception
    print(df.loc[isinstance(df['address'], ipa.IPv4Network)])
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 1110, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexing.py", line 1059, in _get_label
    return self.obj.xs(label, axis=axis)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\generic.py", line 3491, in xs
    loc = self.index.get_loc(key)
  File "pythonProject1111\venv\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: False

非常感谢!!

标签: pythonpandasdataframeconditional-statements

解决方案


正如您所提到的,dtypes如果您有多种类型,它确实有效。您可以这样做:

employees = [('jack', 34, 'Sydney', 155),
            ('Riti', 31, 'Delhi', 177.5),
            ('Aadi', 16, 'Mumbai', 81),
            ('Mohit', 31, 45, 167),
            ('Veena', 12, 'Delhi', 'Serge'),
            ('Shaunak', 35, 'Mumbai', 135),
            ('Shaun', 35, 'Colombo', 111)
            ]
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])
empDfObj.applymap(type).apply(pd.value_counts).fillna(0)

你在哪里使用.apply.

给你

                 Name  Age  City  Marks
<class 'str'>     7.0  0.0   6.0      1
<class 'int'>     0.0  7.0   1.0      5
<class 'float'>   0.0  0.0   0.0      1

你甚至可以算上他们:-)


推荐阅读