python - Pandas 将带有列表对象的列与包含 int 的另一列进行比较
问题描述
我有下面的 panads 数据框,我想在其中比较列的列表对象(列表中的名称)与另一列中的整数值。
数据框构造:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0722882 | Shivam Verma | RD-DI-Infra-Linux | Karn Kumar | Active | IN-NDA02 | 2 |
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0790029 | Akhil Garg | RD-DI-Infra-Storage | Amit Raj | Awaiting User Info | IN-NDA02 | 3 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
熊猫代码:
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
from tabulate import tabulate
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##########################################################################################
def pprint_df(dframe):
print(tabulate(dframe, headers='keys', tablefmt='psql', showindex=False))
names = ['Amit Raj','Andre Geurts','Andrzej Kamionek','Ankur Wason','Ashish Kumar','Carl Thijssen','Chris Masson','Daniel Chorazy','Devarishi Kumar','Elizabeth Tamayo','Eric Oomen','Gopinath Perumal','Jakub Kubera','Jeffrey Thompson','Jeroen Kwanten','Karn Kumar','Kenny Henderson','Manish Kumar','Mihai Pârlea','Mihai Reus','Naveen Kumar','Rafiq Khan','Rob Goossens','Robert in','Roger Smith','Santhoshkumar Krishnamoorthy','Shakir Chaudhry','Sonu Kumar','Suraj Budha','Szymon Kolodziejski','Szymon Kubera','Tony Olsson','Vetrivelan Rajagopalan','Yogesh Miglani','Abrar Ahmad']
col_name = ['Number','Caller','Assignment group','Assigned to','Status(state)','Location','Aging']
df = pd.read_excel('Backlog-April_24.xlsx', usecols=col_name, encoding='utf-8', index=False)
# df = df[df['Assigned to'].isin(names)] <-- This works perfectly with above dataframe
df = df[df['Assigned to'].isin(names) & df['Aging'] >= 5]
print(df.dtypes)
pprint_df(df)
当我运行上面的代码时,即使我将 int 转换为str
.
$ ./pd_code.py
Number object
Caller object
Assignment group object
Assigned to object
Status(state) object
Location object
Aging object
dtype: object
+----------+----------+--------------------+---------------+-----------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|----------+----------+--------------------+---------------+-----------------+------------+---------|
+----------+----------+--------------------+---------------+-----------------+------------+---------+
期望的输出:
例子:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
解决方案
只是为了后代,我们需要使用布尔索引......
布尔索引:
另一种常见的操作是使用布尔向量来过滤数据。运算符有:|
for or、&
for and 和~
for not。这些必须使用括号进行分组。
df = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]
或者
df = df[(df['Assigned to'].isin(names)) & (df['Aging'] >= 5)]
还有一个非常好的关于运算符优先级的详细信息,值得一读。
推荐阅读
- java - 改造在查询参数中添加随机数?
- sql - 在保持关系的同时从现有表中分离出几列到新表中的最佳方法是什么?
- assembly - 使用汇编查找数组中数字的总和很热门
- swift - 如何在 Java/Python 中接收 http 请求?
- firebase - 尝试与谷歌帐户连接时出现颤振错误
- php - 是否有执行以下代码的“Laravel”方式?
- laravel - 在 Laravel 中创建基于路由的通用授权
- pyspark - 重新水合推文/从部分流中提取数据
- react-native - 使用 GraphQL 并使用 FlatList 的 React-native 应用程序
- numpy - 按列值重新排列 numpy 多级数组