首页 > 解决方案 > Pandas 将带有列表对象的列与包含 int 的另一列进行比较

问题描述

我有下面的 panads 数据框,我想在其中比较列的列表对象(列表中的名称)与另一列中的整数值。

数据框构造:

+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number     | Caller                | Assignment group     | Assigned to     | Status(state)      | Location   |   Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0722882 | Shivam Verma          | RD-DI-Infra-Linux    | Karn Kumar      | Active             | IN-NDA02   |       2 |
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar      | Active             | IN-NDA02   |       5 |
| INC0790029 | Akhil Garg            | RD-DI-Infra-Storage  | Amit Raj        | Awaiting User Info | IN-NDA02   |       3 |
| INC0743690 | Japesh Kumar          | RD-DI-Infra-Linux    | Shakir Chaudhry | Awaiting User Info | IN-NDA02   |       5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+

熊猫代码:

from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)

from tabulate import tabulate
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##########################################################################################
def pprint_df(dframe):
    print(tabulate(dframe, headers='keys', tablefmt='psql', showindex=False))

names = ['Amit Raj','Andre Geurts','Andrzej Kamionek','Ankur Wason','Ashish Kumar','Carl Thijssen','Chris Masson','Daniel Chorazy','Devarishi Kumar','Elizabeth Tamayo','Eric Oomen','Gopinath Perumal','Jakub Kubera','Jeffrey Thompson','Jeroen Kwanten','Karn Kumar','Kenny Henderson','Manish Kumar','Mihai Pârlea','Mihai Reus','Naveen Kumar','Rafiq Khan','Rob Goossens','Robert in','Roger Smith','Santhoshkumar Krishnamoorthy','Shakir Chaudhry','Sonu Kumar','Suraj Budha','Szymon Kolodziejski','Szymon Kubera','Tony Olsson','Vetrivelan Rajagopalan','Yogesh Miglani','Abrar Ahmad']

col_name = ['Number','Caller','Assignment group','Assigned to','Status(state)','Location','Aging']

df = pd.read_excel('Backlog-April_24.xlsx', usecols=col_name, encoding='utf-8', index=False)
# df  = df[df['Assigned to'].isin(names)]  <-- This works perfectly with above dataframe

df  = df[df['Assigned to'].isin(names) & df['Aging'] >= 5]
print(df.dtypes)
pprint_df(df)

当我运行上面的代码时,即使我将 int 转换为str.

$ ./pd_code.py
Number              object
Caller              object
Assignment group    object
Assigned to         object
Status(state)       object
Location            object
Aging               object
dtype: object
+----------+----------+--------------------+---------------+-----------------+------------+---------+
| Number   | Caller   | Assignment group   | Assigned to   | Status(state)   | Location   | Aging   |
|----------+----------+--------------------+---------------+-----------------+------------+---------|
+----------+----------+--------------------+---------------+-----------------+------------+---------+

期望的输出:

例子:

+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number     | Caller                | Assignment group     | Assigned to     | Status(state)      | Location   |   Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|

| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar      | Active             | IN-NDA02   |       5 |

| INC0743690 | Japesh Kumar          | RD-DI-Infra-Linux    | Shakir Chaudhry | Awaiting User Info | IN-NDA02   |       5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+

标签: pythonpython-3.xpandasoperators

解决方案


只是为了后代,我们需要使用布尔索引......

布尔索引:

另一种常见的操作是使用布尔向量来过滤数据。运算符有:|for or、&for and 和~for not。这些必须使用括号进行分组。

df  = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]

或者

df  = df[(df['Assigned to'].isin(names)) & (df['Aging'] >= 5)]

还有一个非常好的关于运算符优先级的详细信息,值得一读。


推荐阅读