首页 > 解决方案 > 使用下一行值作为条件

问题描述

假设我有一个日志文件被解析并放入pandas.DataFrame.

我有兴趣创建一个新boolean列,该列True仅在当前行包含EXPRESSION_1字符串且下一行包含EXPRESSION_2表达式时才具有。

我可以只用一个表达式来做,如下Example 1所示:

示例 1:

import pandas as pd


EXPRESSION_1 = 'Starts streaming the stream rtspsrc'
EXPRESSION_2 = 'initializing gst pipeline'
df = pd.DataFrame(
    {
        'message': [
            'Some log text',
            'Some log text',
            'Starts streaming the stream rtspsrc',
            'initializing gst pipeline',
            'Some log text',
            'Starts streaming the stream rtspsrc',
            'initializing gst pipeline',
            'Some log text',
        ]

    }
)
df.loc[:, 'process_started'] = df.loc[:, 'message'].apply(lambda msg: True if msg.find(EXPRESSION_1) > -1 else False)
df

示例 1 的输出:

    message                                 process_started
0   Some log text                           False
1   Some log text                           False
2   Starts streaming the stream rtspsrc     True
3   Some log text                           False
4   Some log text                           False
5   Starts streaming the stream rtspsrc     True
6   initializing gst pipeline               False
7   Some log text                           False

期望的输出:

    message                                 process_started
0   Some log text                           False
1   Some log text                           False
2   Starts streaming the stream rtspsrc     False # <= Note the False here
3   Some log text                           False
4   Some log text                           False
5   Starts streaming the stream rtspsrc     True
6   initializing gst pipeline               False
7   Some log text                           False

在此先感谢您的任何建议。

标签: python-3.xpandasdataframe

解决方案


您可以使用该shift操作来执行此操作。shift(-1)代码中的将列message向上移动 1(简单来说):

import pandas as pd

EXPRESSION_1 = 'Starts streaming the stream rtspsrc'
EXPRESSION_2 = 'initializing gst pipeline'
df = pd.DataFrame(
    {
        'message': [
            'Some log text',
            'Some log text',
            'Starts streaming the stream rtspsrc',
            'Some log text',
            'Some log text',
            'Starts streaming the stream rtspsrc',
            'initializing gst pipeline',
            'Some log text',
        ]

    }
)
df.loc[:, 'process_started'] = df.loc[:, 'message'].apply(lambda msg: True if msg.find(EXPRESSION_1) > -1 else False)

df.loc[(df['message'] == EXPRESSION_1) & (df['message'].shift(-1) == EXPRESSION_2), 'process_started'] = True
df.loc[(df['message'] == EXPRESSION_1) & (df['message'].shift(-1) != EXPRESSION_2), 'process_started'] = False

输出:

    message                                 process_started
0   Some log text                           False
1   Some log text                           False
2   Starts streaming the stream rtspsrc     False
3   Some log text                           False
4   Some log text                           False
5   Starts streaming the stream rtspsrc     True
6   initializing gst pipeline               False
7   Some log text                           False

推荐阅读