首页 > 解决方案 > 将一系列值设置为条形图的变量 - python

问题描述

在我的数据中,我有一列显示以下选项之一:'NOT_TESTED''NOT_COMPLETED''TOO_LOW'或介于 5 步之间的值150190如 150、155、160 等)。
我正在尝试绘制一个条形图,它显示每个出现在列中的时间量,包括每个单独的数字。
所以条形图应该在 x 轴上有变量:'NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW', 150, 155,160等等。
棒的高度应该是它在列中出现的次数。
这是我尝试过的代码,它让我最接近我的目标,但是,所有数字(150-190)都输出 1 作为条形图的值,所以所有的棒都在相同的高度。
这不符合数据,我不知道如何前进。
我是新手,任何指导将不胜感激!

num_range = list(range(150,191, 5))
OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
OUTCOMES.extend(num_range)
df = df.append(pd.DataFrame(num_range, 
       columns=['PT1']),
       ignore_index = True)
df["outcomes_col"] = df["PT1"].astype ("category")
df["outcomes_col"].cat.set_categories(OUTCOMES , inplace = True )
sns.countplot(x= "outcomes_col", data=df, palette='Magma')
plt.xticks(rotation = 90)
plt.ylabel('Amount')
plt.xlabel('Outcomes')
plt.title("Outcomes per Testing")
plt.show()


pd.DataFrame({'ID': {0: 'GF342',  1: 'IF874',  2: 'FH386',  3: 'KJ190',  4: 'TY748',  5: 'YT947',  6: 'DF063',  7: 'ET512',  8: 'GC714',  9: 'SD978',  10: 'EF472',  11: 'PL489',  12: 'AZ315',  13: 'OL821',  14: 'HN765',  15: 'ED589'}, 'Location': {0: 'Q1',  1: 'Q3',  2: 'Q1',  3: 'Q3',  4: 'Q3',  5: 'Q4',  6: 'Q3',  7: 'Q1',  8: 'Q2',  9: 'Q3',  10: 'Q1',  11: 'Q2',  12: 'Q1',  13: 'Q1',  14: 'Q3',  15: 'Q1'}, 'NEW': {0: 'YES',  1: 'NO',  2: 'NO',  3: 'YES',  4: 'YES',  5: 'NO',  6: 'NO',  7: 'YES',  8: 'NO',  9: 'NO',  10: 'NO',  11: 'YES',  12: 'NO',  13: 'YES',  14: 'YES',  15: 'YES'}, 'YEAR': {0: 2021,  1: 2018,  2: 2019,  3: 2021,  4: 2021,  5: 2019,  6: 2019,  7: 2021,  8: 2018,  9: 2019,  10: 2018,  11: 2021,  12: 2018,  13: 2021,  14: 2021,  15: 2021}, 'PT1': {0: '',  1: 'NOT_TESTED',  2: '',  3: 'NOT_FINISHED',  4: '165',  5: '',  6: '180',  7: '145',  8: '155',  9: '',  10: '',  11: '',  12: 'TOO_LOW',  13: '150',  14: '155',  15: ''}, 'PT2': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 'TOO_LOW',  6: '',  7: '',  8: '160',  9: 'TOO_LOW',  10: '',  11: '',  12: '',  13: '',  14: '',  15: ''}, 'PT3': {0: '',  1: 'TOO_LOW',  2: '',  3: 'TOO_LOW',  4: '',  5: '',  6: '',  7: '',  8: '',  9: '',  10: '',  11: 'NOT_FINISHED',  12: '',  13: '185',  14: '',  15: '165'}, 'PT4': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 165.0,  6: '',  7: '',  8: '',  9: '',  10: '',  11: '',  12: 180.0,  13: '',  14: '',  15: ''}})

这不是整个数据集,只是其中的一部分。

标签: pythonpandasdataframematplotlibseaborn

解决方案


从此数据框开始:(
我替换NOT_FINISHEDNOT_COMPLETED, 符合您问题中的代码,让我知道此替换是否正确)

       ID Location  NEW  YEAR            PT1      PT2            PT3  PT4
0   GF342       Q1  YES  2021                                            
1   IF874       Q3   NO  2018     NOT_TESTED                 TOO_LOW     
2   FH386       Q1   NO  2019                                            
3   KJ190       Q3  YES  2021  NOT_COMPLETED                 TOO_LOW     
4   TY748       Q3  YES  2021            165                             
5   YT947       Q4   NO  2019                 TOO_LOW                 165
6   DF063       Q3   NO  2019            180                             
7   ET512       Q1  YES  2021            145                             
8   GC714       Q2   NO  2018            155      160                    
9   SD978       Q3   NO  2019                 TOO_LOW                    
10  EF472       Q1   NO  2018                                            
11  PL489       Q2  YES  2021                          NOT_COMPLETED     
12  AZ315       Q1   NO  2018        TOO_LOW                          180
13  OL821       Q1  YES  2021            150                     185     
14  HN765       Q3  YES  2021            155                             
15  ED589       Q1  YES  2021                                    165     

如果您对'PT1'列的计数图感兴趣,首先您必须定义要绘制的类别。您可以使用pandas.CategoricalDtype,因此您可以对这些类别进行排序。
因此,您定义了一个新'outcomes_col'列:

num_range = list(range(150,191, 5))
OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
OUTCOMES.extend([str(num) for num in num_range])
OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
df["outcomes_col"] = df["PT1"].astype(OUTCOMES)

然后您可以继续绘制此列:

sns.countplot(x= "outcomes_col", data=df, palette='Magma')
plt.xticks(rotation = 90)
plt.ylabel('Amount')
plt.xlabel('Outcomes')
plt.title("Outcomes per Testing")
    
plt.show()

完整代码

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.api.types import CategoricalDtype


df = pd.DataFrame({'ID': {0: 'GF342',  1: 'IF874',  2: 'FH386',  3: 'KJ190',  4: 'TY748',  5: 'YT947',  6: 'DF063',  7: 'ET512',  8: 'GC714',  9: 'SD978',  10: 'EF472',  11: 'PL489',  12: 'AZ315',  13: 'OL821',  14: 'HN765',  15: 'ED589'}, 'Location': {0: 'Q1',  1: 'Q3',  2: 'Q1',  3: 'Q3',  4: 'Q3',  5: 'Q4',  6: 'Q3',  7: 'Q1',  8: 'Q2',  9: 'Q3',  10: 'Q1',  11: 'Q2',  12: 'Q1',  13: 'Q1',  14: 'Q3',  15: 'Q1'}, 'NEW': {0: 'YES',  1: 'NO',  2: 'NO',  3: 'YES',  4: 'YES',  5: 'NO',  6: 'NO',  7: 'YES',  8: 'NO',  9: 'NO',  10: 'NO',  11: 'YES',  12: 'NO',  13: 'YES',  14: 'YES',  15: 'YES'}, 'YEAR': {0: 2021,  1: 2018,  2: 2019,  3: 2021,  4: 2021,  5: 2019,  6: 2019,  7: 2021,  8: 2018,  9: 2019,  10: 2018,  11: 2021,  12: 2018,  13: 2021,  14: 2021,  15: 2021}, 'PT1': {0: '',  1: 'NOT_TESTED',  2: '',  3: 'NOT_COMPLETED',  4: '165',  5: '',  6: '180',  7: '145',  8: '155',  9: '',  10: '',  11: '',  12: 'TOO_LOW',  13: '150',  14: '155',  15: ''}, 'PT2': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 'TOO_LOW',  6: '',  7: '',  8: '160',  9: 'TOO_LOW',  10: '',  11: '',  12: '',  13: '',  14: '',  15: ''}, 'PT3': {0: '',  1: 'TOO_LOW',  2: '',  3: 'TOO_LOW',  4: '',  5: '',  6: '',  7: '',  8: '',  9: '',  10: '',  11: 'NOT_COMPLETED',  12: '',  13: '185',  14: '',  15: '165'}, 'PT4': {0: '',  1: '',  2: '',  3: '',  4: '',  5: 165.0,  6: '',  7: '',  8: '',  9: '',  10: '',  11: '',  12: 180.0,  13: '',  14: '',  15: ''}})

num_range = list(range(150,191, 5))
OUTCOMES = ['NOT_TESTED', 'NOT_COMPLETED', 'TOO_LOW']
OUTCOMES.extend([str(num) for num in num_range])
OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
df["outcomes_col"] = df["PT1"].astype(OUTCOMES)

sns.countplot(x= "outcomes_col", data=df, palette='Magma')
plt.xticks(rotation = 90)
plt.ylabel('Amount')
plt.xlabel('Outcomes')
plt.title("Outcomes per Testing")

plt.show()

在此处输入图像描述


推荐阅读