首页 > 解决方案 > 在 Python 中过滤列表

问题描述

我有一个 Python 列表

用户名、功能、项目、描述、日期、时间、年份、版本

['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:48:54', '2021', '2']
['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:56:49', '2021', '2']
['erinil01', 'Prosjektadmin', '920208', 'Lastet prosjektet', '06/07/21', '12:59:09', '2021', '2']
['erinil01', 'Prosjektadmin', '920208', 'Lagret prosjektet', '06/07/21', '12:59:17', '2021', '2']
['erh4021', 'Oppstart', '', 'Startet programmet', '06/07/21', '13:02:38', '2021', '2']
['erinil01', 'Prosjektadmin', '921106', 'Lagt til nytt prosjekt', '06/07/21', '13:06:45', '2021', '2']
['erinil01', 'Prosjektadmin', '921107', 'Lagt til nytt prosjekt', '06/07/21', '13:07:02', '2021', '2']
['erinil01', 'Prosjektadmin', '921106', 'Lastet prosjektet', '06/07/21', '13:07:08', '2021', '2']

假设我只想根据不同的条件过滤此列表,例如用户名、功能、项目、日期、年份等。如果某些过滤器为空,则根据其他条件显示全部。

提示?

标签: pythonlistif-statementfilter

解决方案


你没有说你是如何得到这个列表的,但它看起来很像嵌套列表。

data = [
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:48:54', '2021', '2'],
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:56:49', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lastet prosjektet', '06/07/21', '12:59:09', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lagret prosjektet', '06/07/21', '12:59:17', '2021', '2'],
    ['erh4021', 'Oppstart', '', 'Startet programmet', '06/07/21', '13:02:38', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lagt til nytt prosjekt', '06/07/21', '13:06:45', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921107', 'Lagt til nytt prosjekt', '06/07/21', '13:07:02', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lastet prosjektet', '06/07/21', '13:07:08', '2021', '2'],
]

对于嵌套列表,您必须使用for-loop 单独处理每一行。

对于每一行,您都可以使用索引来检查值。

这将获取所有具有空值且Project具有索引的行[2]

filtered_data = []

for row in data:
    if not row[2]:
        #print('empty:', row)
        filtered_data.append(row)
      
print('--- filtered_data ---')

for row in filtered_data:
    print(row)

对于更复杂的过滤器,您必须创建更复杂if的 .

为了使其更通用,您可以创建获取单行并返回的函数,True或者False如果您想保留这一行。

def selected(row):
    #if not row[2]:
    #    return True
    #else:
    #    return False
    
    # shorter
    return not row[2]

filtered_data = []

for row in data:
    if selected(row):
        #print('empty:', row)
        filtered_data.append(row)

然后你甚至可以将其简化为列表理解

filtered_data = [row for row in data if selected(row)]

或使用功能filter()

filtered_data = list(filter(selected, data))

这样,您可以创建不同的功能selected()来组合过滤器。

filtered_data = list(filter(selected_1, data))
filtered_data = list(filter(selected_2, filtered_data))
filtered_data = list(filter(selected_3, filtered_data))

顺便提一句:

如果您从数据库中获取数据,那么您可以SQL query在从数据库中获取数据时直接过滤数据。

如果您可以保留数据,pandas.DataFrame那么您可以使用 column'n 名称Username, Function, Project, Description, Date, Time, Year, Version来过滤它。


编辑:

最小的工作示例

data = [
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:48:54', '2021', '2'],
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:56:49', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lastet prosjektet', '06/07/21', '12:59:09', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lagret prosjektet', '06/07/21', '12:59:17', '2021', '2'],
    ['erh4021', 'Oppstart', '', 'Startet programmet', '06/07/21', '13:02:38', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lagt til nytt prosjekt', '06/07/21', '13:06:45', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921107', 'Lagt til nytt prosjekt', '06/07/21', '13:07:02', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lastet prosjektet', '06/07/21', '13:07:08', '2021', '2'],
]

# --- version 1 ---

filtered_data = []

for row in data:
    if (not row[2]) or (int(row[2]) > 920208):
        #print('empty:', row)
        filtered_data.append(row)
      
print('--- filtered_data ---')

for row in filtered_data:
    print(row)
    
# --- version 2 ---
    
def selected(row):
    #if (not row[2]) or (int(row[2]) > 920208):
    #    return True
    #else:
    #    return False
    
    # shorter
    return (not row[2]) or (int(row[2]) > 920208)
    
def selected_1(row):
    return not row[2]

def selected_2(row):
    return int(row[2]) > 920208

filtered_data = []

for row in data:
    if selected_1(row) or selected_2(row):
    #if selected(row):
        #print('empty:', row)
        filtered_data.append(row)
      
print('--- filtered_data ---')

for row in filtered_data:
    print(row)
    
# --- version 3 ---
    
def selected(row):
    return (not row[2]) or (int(row[2]) > 920208)
    
def selected_1(row):
    return not row[2]

def selected_2(row):
    return int(row[2]) > 920208

filtered_data = [row for row in data if selected(row)]
filtered_data = [row for row in data if selected_1(row) or selected_2(row)]
      
print('--- filtered_data ---')

for row in filtered_data:
    print(row)
    
# --- version 4 ---
    
def selected(row):
    return (not row[2]) or (int(row[2]) > 920208)
    
def selected_1(row):
    return not row[2]

def selected_2(row):
    return int(row[2]) > 920208

#filtered_data = list(filter(selected, data))
filtered_data = list(filter(lambda row:selected_1(row) or selected_2(row), data))
      
print('--- filtered_data ---')

for row in filtered_data:
    print(row)            

编辑:

和。。。相似pandas

data = [
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:48:54', '2021', '2'],
    ['erinil01', 'Oppstart', '', 'Startet programmet', '06/07/21', '12:56:49', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lastet prosjektet', '06/07/21', '12:59:09', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '920208', 'Lagret prosjektet', '06/07/21', '12:59:17', '2021', '2'],
    ['erh4021', 'Oppstart', '', 'Startet programmet', '06/07/21', '13:02:38', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lagt til nytt prosjekt', '06/07/21', '13:06:45', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921107', 'Lagt til nytt prosjekt', '06/07/21', '13:07:02', '2021', '2'],
    ['erinil01', 'Prosjektadmin', '921106', 'Lastet prosjektet', '06/07/21', '13:07:08', '2021', '2'],
]
    
import pandas as pd
import numpy as np

df = pd.DataFrame(data, columns=['Username', 'Function', 'Project', 'Description', 'Date', 'Time', 'Year', 'Version'])
df = df.replace(r'', np.nan)  # to compare empty string with `float` value `920208`
print(df)

mask1 = df['Project'].isnull()  # detect `np.nan`
#print(mask1)

mask2 = (df['Project'].astype(float) > 920208)
#print(mask2)

filtered_data = df[ mask1 | mask2 ]  # `|` means `or` , `&` means `and`

print('--- filtered_data ---')

print(filtered_data)

推荐阅读