首页 > 解决方案 > 熊猫按连续值过滤?

问题描述

所以我试图通过连续的值过滤熊猫数据框。基本上我有一个df,其中一行包含建筑物的名称,例如。教育、K-12、办公室、教堂等。

我想根据这些值过滤一个新的数据框。例如。我想“提取”单元格值等于“教育,K-12”的列。我该怎么做呢?

我进行了广泛搜索,但大多数链式过滤似乎都基于列值。这不应该基于列值。

谢谢!

          SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3  \
0         Commercial        Commercial        Commercial        Commercial   
1        Fossil Fuel       Fossil Fuel       Fossil Fuel       Fossil Fuel   
2    Education, K-12   Education, K-12   Education, K-12   Education, K-12   
..               ...               ...               ...               ...   
 

            SAN ANTONIO, TX.429  SAN ANTONIO, TX.430 SAN ANTONIO, TX.431  
0            Commercial          Commercial          Commercial  
1              Electric            Electric            Electric  
2         Office, Large       Office, Large       Office, Large  
..                  ...                 ...                 ...  


[745 rows x 432 columns]>

标签: pythonpandasfiltering

解决方案


在测试想法后,我创建了这个

cols = df.columns[ df.iloc[2] == 'Education, K-12' ]

df[ cols ]

我只得到一行iloc[2],所以我Series可以比较其中的值Series-'Education, K-12'True/False为该行中的每个项目提供了值,我可以使用它来过滤列。


最小的工作示例。

io.StringIO仅用于模拟内存中的文件,但您应该使用普通文件名。

text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''

import io
import pandas as pd

df = pd.read_csv(io.StringIO(text), sep=';')

print('\n--- df ---\n')
print(df)

print('\n--- Series ---\n')
print( df.iloc[2] )

print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )

print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)

print('\n--- columns ---\n')
print(df[ cols ])

结果:

--- df ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12     Office, Large   Education, K-12

--- Series ---

SAN ANTONIO, TX      Education, K-12
SAN ANTONIO, TX.1    Education, K-12
SAN ANTONIO, TX.2      Office, Large
SAN ANTONIO, TX.3    Education, K-12
Name: 2, dtype: object

--- mask ---

SAN ANTONIO, TX       True
SAN ANTONIO, TX.1     True
SAN ANTONIO, TX.2    False
SAN ANTONIO, TX.3     True
Name: 2, dtype: bool

--- names ---

Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')

--- columns ---

   SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0       Commercial        Commercial        Commercial
1      Fossil Fuel       Fossil Fuel       Fossil Fuel
2  Education, K-12   Education, K-12   Education, K-12

推荐阅读