首页 > 解决方案 > 枚举熊猫数据框中的行

问题描述

我有一个数据框,每 5 行只有 A 类和 B 类,例如:前 5 行属于 A 类,接下来的 5 行属于 B 类,接下来的 5 行属于 A 类,然后另外5个B级,依此类推。

我的数据框如下所示:

   17   18   19   20    class  
    0  190  222  178  214  class_A  
    1  190  220  178  214  class_A  
    2  185  221  178  207  class_A   
    3  186  221  179  207  class_A   
    4  182  220  174  207  class_A   
    5  182  227  193  227  class_B 
    6  183  224  194  227  class_B  
    7  190  225  196  229  class_B  
    8  189  227  198  231  class_B  
    9  190  226  198  229  class_B 

我目前的问题是它正确枚举了 10 行。但是,如果我有 15 行,它将枚举 C 类,对于第 20 行,它将是 D 类。

我目前正在做的是这样的:

pixel_index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
indices = sorted(list(range(0,int(my_array.shape[0]/5)))*5)
class_dict = dict(zip(range(0,int((my_array.shape[0]/5))), string.ascii_uppercase ))
target_names = ["Class_" + c for c in class_dict.values()]
X = pd.DataFrame(my_array, columns= pixel_index)
y = pd.Categorical.from_codes(indices,target_names)
X.join(pd.Series(y,name='class'))

预期产出

          16        17        18        19        20    class  
0   0.058723  0.957086  0.340504  0.487644  0.810331  Class_A  
1   0.957106  0.906153  0.980786  0.407397  0.161386  Class_A  
2   0.911219  0.532552  0.543188  0.914856  0.910459  Class_A  
3   0.098517  0.967793  0.053691  0.716490  0.321336  Class_A  
4   0.688776  0.799750  0.242053  0.471356  0.169656  Class_A  
5   0.299303  0.684973  0.439007  0.555809  0.981216  Class_B  
6   0.306941  0.620774  0.282115  0.909423  0.067088  Class_B  
7   0.393058  0.196038  0.275761  0.463923  0.001078  Class_B  
8   0.000752  0.023837  0.192975  0.336385  0.895855  Class_B  
9   0.687067  0.171965  0.640440  0.141899  0.396111  Class_B  
10  0.106006  0.683805  0.798161  0.734071  0.233504  Class_A  
11  0.048247  0.687286  0.451302  0.827995  0.746302  Class_A  
12  0.410207  0.152911  0.007241  0.788971  0.486820  Class_A  
13  0.562021  0.930720  0.624477  0.383298  0.048881  Class_A  
14  0.387534  0.934789  0.115663  0.913763  0.102637  Class_A  
15  0.983388  0.609524  0.178221  0.187325  0.627132  Class_B  
16  0.211271  0.951792  0.156106  0.543936  0.106595  Class_B  
17  0.374171  0.375149  0.677240  0.174649  0.429010  Class_B  
18  0.092739  0.919603  0.741347  0.927791  0.095581  Class_B  
19  0.354681  0.919875  0.226072  0.935013  0.232503  Class_B  
20  0.545493  0.267462  0.133207  0.994136  0.429743  Class_A  
21  0.086750  0.106376  0.673137  0.591182  0.369256  Class_A  
22  0.317830  0.896352  0.503860  0.651258  0.214815  Class_A  
23  0.621201  0.754447  0.204289  0.678926  0.627512  Class_A  
24  0.682076  0.004520  0.610102  0.393055  0.908849  Class_A

例如,如果我有 500 行,我希望它们按该顺序枚举,前 5 行始终是 A,接下来的 5 行是 B 类,始终按该顺序。

标签: pythonpython-3.xpandasnumpydataframe

解决方案


import pandas as pd
class_names = list('ABCD')
df = pd.DataFrame({'X': range(0, 100), 'Y': range(100, 200)})
target_names = ["Class_" + c for c in class_names]
n_sets = df.shape[0]//5
class_col = []
for name in target_names:
    class_col += [name]*5
n_sets = df.shape[0]//(5*len(target_names))
class_col = class_col*n_sets
df['class'] = class_col
print(df.head(20))

输出

     X    Y    class
0    0  100  Class_A
1    1  101  Class_A
2    2  102  Class_A
3    3  103  Class_A
4    4  104  Class_A
5    5  105  Class_B
6    6  106  Class_B
7    7  107  Class_B
8    8  108  Class_B
9    9  109  Class_B
10  10  110  Class_C
11  11  111  Class_C
12  12  112  Class_C
13  13  113  Class_C
14  14  114  Class_C
15  15  115  Class_D
16  16  116  Class_D
17  17  117  Class_D
18  18  118  Class_D
19  19  119  Class_D

用这个替换你的代码。

pixel_index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
    11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

df = pd.DataFrame(my_array, columns=pixel_index)

class_names = list('ABCD')
target_names = ["Class_" + c for c in class_names]
n_sets = df.shape[0]//5
class_col = []
for name in target_names:
    class_col += [name]*5
n_sets = df.shape[0]//(5*len(target_names))
class_col = class_col*n_sets
df['class'] = class_col
print(df.head(20))

推荐阅读