python - 试图在 python 3 中获取所有不等于 0.000000 的列值
问题描述
我有一个如下所示的数据集,我试图在特征列中获取重要性列不等于 0.000000 的每个名称,并将它们直接放入列表中以立即使用。我尝试了一些方法,但主要的两种方法如下:
方法一
new_features = []
for i in importance_ranking['importance']:
if i > 0.000000:
new_features.append(i)
new_features
方法 1 只是获取了重要性列的所有值,但我想要特征列值,所以我尝试了方法 2
方法二
features_to_use = []
for x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
方法2向我抛出错误如下:
方法二错误
ValueError Traceback (most recent call last)
<ipython-input-1181-d1ec4f141ff9> in <module>()
1 features_to_use = []
----> 2 for x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 2)
任何帮助是极大的赞赏
方法3和错误
features_to_use = []
for s,x,y in importance_ranking:
if y > 0.000000:
features_to_use.append(x)
features_to_use
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1182-8ed92369130e> in <module>()
1 features_to_use = []
----> 2 for s,x,y in importance_ranking:
3 if y > 0.000000:
4 features_to_use.append(x)
5
ValueError: too many values to unpack (expected 3)
数据集
**feature importance**
1 src_bytes 0.541433
18 count 0.160338
30 dst_host_diff_srv_rate 0.074743
53 service_bgp 0.066960
31 dst_host_same_src_port_rate 0.045040
28 dst_host_srv_count 0.027176
9 num_compromised 0.016684
25 diff_srv_rate 0.008991
58 service_pm_dump 0.008533
62 service_auth 0.008270
29 dst_host_same_srv_rate 0.006760
2 dst_bytes 0.005153
33 dst_host_serror_rate 0.004642
6 hot 0.003985
32 dst_host_srv_diff_host_rate 0.003330
35 dst_host_rerror_rate 0.002923
34 dst_host_srv_serror_rate 0.002222
87 service_klogin 0.002135
116 flag_SH 0.001553
0 duration 0.001263
7 num_failed_logins 0.001125
22 rerror_rate 0.001011
27 dst_host_count 0.000917
4 wrong_fragment 0.000736
52 service_ntp_u 0.000489
37 flag_RSTOS0 0.000468
3 land 0.000449
111 service_tftp_u 0.000355
19 srv_count 0.000289
8 logged_in 0.000284
... ... ...
16 is_host_login 0.000000
40 service_Z39_50 0.000000
41 service_http_443 0.000000
43 service_other 0.000000
44 protocol_type_tcp 0.000000
45 service_link 0.000000
46 service_X11 0.000000
47 service_exec 0.000000
48 service_red_i 0.000000
49 service_http_2784 0.000000
用于创建数据框的行
importance_ranking = pd.DataFrame({'feature':all_cols, 'importance':dt.feature_importances_})
数据框的图片
新测试
#features_to_use = []
a,b = importance_ranking[0]
#for s,x,y in importance_ranking:
# if y > 0.000000:
# features_to_use.append(x)
#
#features_to_use
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-1244-5d9e2e614219> in <module>()
1 #features_to_use = []
----> 2 a,b = importance_ranking[0]
3 #for s,x,y in importance_ranking:
4 # if y > 0.000000:
5 # features_to_use.append(x)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
解决方案
我认为最好的办法是使用布尔索引:
df = importance_ranking[importance_ranking['importance']>0.000000]
然后获取功能:
features = df.features
推荐阅读
- ruby-on-rails - Rails rspec get:index 返回 html 而不是 json
- json - 如何将 RF 中的布尔值(真/假)转换为 Json 中的布尔值(真/假)
- docker - Docker - Flutter Web 部署 - 无法访问站点
- apache-kafka - Kafka 日志压缩主题获取密钥为 null
- istio - 如何在 istio v1.1.1 中重置断路器计数器?
- reactjs - 将数据传递给反应映射函数
- jquery - setDate 和 maxDate 组合 jQuery Datepicker
- javascript - 如何使用单选按钮获取在javascript中选择的项目的总和
- javascript - 如何使用编辑表单中的剔除数据绑定将动态 ASP.NET DropDownListFor() 与数据库中的存储值绑定
- logging - Serilog 每个级别的不同日志文件