首页 > 解决方案 > 在 pandas 列中出现少于 K 次的掩码值(不区分大小写的比较)

问题描述

我想使用 Python 在 Pandas 数据框中搜索整个列“仓库”,如果单元格值出现 3 次以上,我想将相同的值写入 GeneralDescription 列。我正在尝试编写适用于数千行并忽略大小写值的代码。这是我尝试完成此操作的代码,它仅输出出现超过 3 次的值,但不向 GeneralDescription 列写入任何内容。我究竟做错了什么?任何帮助是极大的赞赏。

import pandas as pd
from collections import Counter
import numpy as np

data= [[2,'Empty','Empty'],[3,'General Liability','Empty'],[4,'WRS','Empty'],[5,'WRS','Empty'],[6,'CENTRAL','Empty'],[7,'General Liability','Empty'],[8,'CENTRAL','Empty'],[9,'wrs','Empty'],[10,'WRS','Empty'],[11,'GENERAL LIABILITY','Empty'],[12,'General Liability','Empty']]
df1=pd.DataFrame(data,columns=['LineNum','Warehouse','GeneralDescription'])

vc=df1.Warehouse.value_counts()
#print (vc[vc>3].index[0])

counts=Counter(df1.Warehouse.str.lower())
df1[df1.Warehouse.str.lower().isin([key for key in counts if counts[key]>3])].fillna(df1['GeneralDescription']) 

df1

    LineNum Warehouse           GeneralDescription
0   2       Empty               Empty
1   3       General Liability   Empty
2   4       WRS                 Empty
3   5       WRS                 Empty
4   6       CENTRAL             Empty
5   7       General Liability   Empty
6   8       CENTRAL             Empty
7   9       wrs                 Empty
8  10       WRS                 Empty
9  11       GENERAL LIABILITY   Empty
10 12       General Liability   Empty

df2 期望的结果

      LineNum Warehouse           GeneralDescription
0     2                         
1     3       General Liability   General Liability
2     4       WRS                 WRS
3     5       WRS                 WRS
4     6       CENTRAL             
5     7       General Liability   General Liability
6     8       CENTRAL             
7     9       wrs                 WRS
8    10       WRS                 WRS
9    11       GENERAL LIABILITY   General Liability
10   12       General Liability   General Liability

标签: pythonstringpandasdataframe

解决方案


您可以按大小写规范化列str.title,然后使用value_counts+map创建掩码。

i = df1.Warehouse.replace('Empty', np.nan).str.title()
df1['GeneralDescription'] = df1.Warehouse.where(i.map(i.value_counts()).gt(3))

print(df1)
    LineNum          Warehouse GeneralDescription
0         2              Empty                NaN
1         3  General Liability  General Liability
2         4                WRS                WRS
3         5                WRS                WRS
4         6            CENTRAL                NaN
5         7  General Liability  General Liability
6         8            CENTRAL                NaN
7         9                wrs                wrs
8        10                WRS                WRS
9        11  GENERAL LIABILITY  GENERAL LIABILITY
10       12  General Liability  General Liability

推荐阅读