首页 > 解决方案 > 试图在 python 3 中获取所有不等于 0.000000 的列值

问题描述

我有一个如下所示的数据集,我试图在特征列中获取重要性列不等于 0.000000 的每个名称,并将它们直接放入列表中以立即使用。我尝试了一些方法,但主要的两种方法如下:

方法一

new_features = []

for i in importance_ranking['importance']:
    if i > 0.000000:
        new_features.append(i)
        
new_features

方法 1 只是获取了重要性列的所有值,但我想要特征列值,所以我尝试了方法 2

方法二

features_to_use = []
for x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)
        
features_to_use

方法2向我抛出错误如下:

方法二错误

    ValueError                                Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
      1 features_to_use = []
----> 2 for x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5 

ValueError: too many values to unpack (expected 2)

任何帮助是极大的赞赏

方法3和错误

    features_to_use = []
for s,x,y in importance_ranking:
    if y > 0.000000:
        features_to_use.append(x)

features_to_use
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
      1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
      3     if y > 0.000000:
      4         features_to_use.append(x)
      5 

ValueError: too many values to unpack (expected 3)

数据集

   **feature    importance**
1   src_bytes   0.541433
18  count   0.160338
30  dst_host_diff_srv_rate  0.074743
53  service_bgp 0.066960
31  dst_host_same_src_port_rate 0.045040
28  dst_host_srv_count  0.027176
9   num_compromised 0.016684
25  diff_srv_rate   0.008991
58  service_pm_dump 0.008533
62  service_auth    0.008270
29  dst_host_same_srv_rate  0.006760
2   dst_bytes   0.005153
33  dst_host_serror_rate    0.004642
6   hot 0.003985
32  dst_host_srv_diff_host_rate 0.003330
35  dst_host_rerror_rate    0.002923
34  dst_host_srv_serror_rate    0.002222
87  service_klogin  0.002135
116 flag_SH 0.001553
0   duration    0.001263
7   num_failed_logins   0.001125
22  rerror_rate 0.001011
27  dst_host_count  0.000917
4   wrong_fragment  0.000736
52  service_ntp_u   0.000489
37  flag_RSTOS0 0.000468
3   land    0.000449
111 service_tftp_u  0.000355
19  srv_count   0.000289
8   logged_in   0.000284
... ... ...
16  is_host_login   0.000000
40  service_Z39_50  0.000000
41  service_http_443    0.000000
43  service_other   0.000000
44  protocol_type_tcp   0.000000
45  service_link    0.000000
46  service_X11 0.000000
47  service_exec    0.000000
48  service_red_i   0.000000
49  service_http_2784   0.000000

用于创建数据框的行

importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})

数据框的图片

在此处输入图像描述

新测试

#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
 #   if y > 0.000000:
     #   features_to_use.append(x)
#
#features_to_use


KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
      1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
      3 #for s,x,y in importance_ranking:
      4  #   if y > 0.000000:
      5      #   features_to_use.append(x)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

标签: pythonpython-3.xpandasdataframeanaconda

解决方案


我认为最好的办法是使用布尔索引

df = importance_ranking[importance_ranking['importance']>0.000000]

然后获取功能:

features = df.features

推荐阅读