首页 > 解决方案 > 从 Pandas DataFrame 中删除不同的对

问题描述

我有 pandas 列,它有 2 列带有文本值:

import pandas as pd

df = pd.DataFrame({"text": ["how are you", "this is an apple", "how are you", "hello my friend", "how are you", "this is an apple", "are you ok", "are you ok"],
                  "type": ["question", "statement", "question", "statement", "statement", "question", "question", "question"]})

print(df)

               text       type
0       how are you   question
1  this is an apple  statement
2       how are you   question
3   hello my friend  statement
4       how are you  statement
5  this is an apple   question
6        are you ok   question
7        are you ok   question

我想找到具有不同“类型”列值的对(“文本”列中的 2 个或更多值)。例如,您可以看到值“你好吗”具有“问题”和“陈述”。所以我的结果应该是:

               text       type

3   hello my friend  statement
6        are you ok   question
7        are you ok   question

因为'are you ok'和的文本值'hello my friend'具有 的唯一值"type"

我试过了,remove_duplicates()但效果不好。我正在考虑按"text"列分组,但我不知道如何检查组是否具有不同/非唯一的"type"列值。

标签: pythonpandas

解决方案


这是groupby().nunique()

df[df.groupby('text')['type'].transform('nunique')==1]

输出:

              text       type
3  hello my friend  statement
6       are you ok   question
7       are you ok   question

推荐阅读