首页 > 解决方案 > 使用 Pandas 和字典按功能过滤 .csv 文件

问题描述

我正在尝试通过某个功能分隔给定的 .csv 文件。根据请求,load_data无法更改功能。

是否有更好的方法将有效数据(给定 .csv 文件的整行)过滤到 data_valid 并将无效数据过滤到 data_invalid?

def load_data(path):
    df = pd.read_csv(path)
    data = df.to_dict(orient='list')

    return data
def filter_by_feature(data,feature):
    data_valid=[]
    data_invalid=[]

    i=0

    for k in data[feature]:
        if k == 1:
            append_line_to_dict(data_valid,i)
        else:
            append_line_to_dict(data_invalid,i)
        i+=1

append_line_to_dict函数简单地使用它接收的索引遍历所有字典并附加它们。

例如给定这个 .csv

ind name is_legal
0   James 1
1   Dykan 0
2   Sam   1
3   Jake  1

数据看起来像

data = {['ind':[0,1,2,3],'name':["James","Dylan","Sam","Jake"],'is_legal':[1,0,1,1]

data_valid 应该看起来像

data_valid = {['ind':[0,2,3],'name':["James","Sam","Jake"],'is_legal':[1,1,1] ]}

这是我的代码

data = load_data(path)
filter_by_feature(data,"is_legal")

标签: pythonpandas

解决方案


你可以这样做-

data = {
    "ind": [0, 1, 2, 3],
    "name": ["James", "Dylan", "Sam", "Jake"],
    "is_legal": [1, 0, 1, 1]
}
def filter_by_feature(data,feature):
    data_valid={}
    data_invalid={}

    # print(data['is_legal'])
    valid_indices = [i for i, x in enumerate(data[feature]) if x == 1]
    invalid_indices = [i for i, x in enumerate(data[feature]) if x != 1]
    for key,item in data.items():
        valid_item = [item[i] for i in valid_indices]
        invalid_item = [item[i] for i in invalid_indices]
        data_valid[key] = valid_item
        data_invalid[key] = invalid_item
    
    print(data_valid)
    print(data_invalid)
            
filter_by_feature(data, 'is_legal')

输出-

{'ind': [0, 2, 3], 'name': ['James', 'Sam', 'Jake'], 'is_legal': [1, 1, 1]}
{'ind': [1], 'name': ['Dylan'], 'is_legal': [0]}

推荐阅读