首页 > 解决方案 > pandas 中带有 groupby 的多个条件语句

问题描述

我有一个类似于以下的数据集。

date,score
3/1/16,0.6369
5/1/16,-0.2023
6/1/16,0.04
7/1/16,0.0772
9/1/16,-0.4215
12/1/16,0.2960
15/1/16,0.25
15/1/16,0.7684

我想在乐谱上应用以下条件。

Con1: if the score is >.05, count that as positive for that date
Con2: if the score is  -0.05<=score <=.05, count that as neutral for that date
Con3: Else, count that as negative for that date
And add a new_column to the DataFrame alongside the score to put the 'negative'/'positive'/'neutral' result

预期输出:

date, score, mood
3/1/16,0.6369, positive
5/1/16,-.2023, negative
6/1/16,0.04, neutral

我在同一天有多个分数。所以,我想用多列('date'和'score')使用groupby,并通过if条件并向DataFrame添加一个新列['mood']。

我试过的:

df =pd.read_csv('file.csv')
def SortMood(df)
df['mood']=[] #empty column as a list in the df to store the mood 
 for score in df['score']:
      if score>(0.05):
            df['mood'].append('positive')
      elif -0.05<=score <=.05:
            df['mood'].append('neutral')
      else:
          df['mood'].append('negative')

我知道这个函数是错误的(我得到一个 ValueError)。因此,任何帮助表示赞赏。谢谢你。

标签: pythonpandascsv

解决方案


用于pd.cut将您的数据分类为:

df['mood'] = pd.cut(df['score'], 
                    bins=[-np.inf, -.05, .05, np.inf], 
                    labels=['negative', 'neutral', 'positive'])

      date   score      mood
0   3/1/16  0.6369  positive
1   5/1/16 -0.2023  negative
2   6/1/16  0.0400   neutral
3   7/1/16  0.0772  positive
4   9/1/16 -0.4215  negative
5  12/1/16  0.2960  positive
6  15/1/16  0.2500  positive
7  15/1/16  0.7684  positive

numpy.select用于向量化多条件列:

conditions = [
    df['score'].lt(-.05),
    df['score'].between(-.05, 0.05)
]

df['mood'] = np.select(conditions, ['negative', 'neutral'], default='positive')

      date   score      mood
0   3/1/16  0.6369  positive
1   5/1/16 -0.2023  negative
2   6/1/16  0.0400   neutral
3   7/1/16  0.0772  positive
4   9/1/16 -0.4215  negative
5  12/1/16  0.2960  positive
6  15/1/16  0.2500  positive
7  15/1/16  0.7684  positive

推荐阅读