首页 > 解决方案 > 熊猫使用 loc 更改特定行的数据集值

问题描述

我是编程新手,我正在使用 pandas 开发一个 python 项目我想使用 .loc 更改数据集每一行的值,但它似乎行不通,我的想法是让一行如果该行等于 0,则 EOL 值,代码不会带来错误,但我的数据集在迭代后没有变化。这是代码:

for machines in telemetry_days['machineID']:
EOL = 365
i = 0

for row in telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)]:
    
    if (row != 0):
        EOL = row
      
    elif (row == 0):
        telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)].iloc[i] = EOL
    i = i + 1

我认为这是因为我使用的是 .iloc,所以它不会改变数据集中“failure_comp1”的值。但是我不知道如何在不使用 .iloc 的情况下从 .loc 获取特定行,如果有人有任何建议,我将非常感激,谢谢 这是整个数据集的结构(不要介意 NaN ): 在此处输入图像描述 这是我所拥有的示例(对于一台“机器”):

index failure_comp1
67    0
254   150
568   0
850   0
998   345

我希望它变成这样:

index failure_comp1
67    365
254   150
568   150
850   150
998 345

这是一个时间序列数据集,我想用它的生命周期结束时间(天数)标记机器的每个组件,我已经在它失败的日期标记了它,但我想为每一行标记它该特定组件的。

标签: pythonpandasdataframe

解决方案


So I wouldn't iterate through the rows (although you could if you want, I'll show that solution too). But what I would do is use a .groupby('macineID'). 1) Then convert all the 0s to nan. 2) forward fill the nans. 3) this will leave the first 0 as a nan, so finally fillna with 365.

Given as a sample data set:

import pandas as pd

telemetry_days = pd.DataFrame({
    'machineID':['11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44'],
    'failure_comp1':[0,2,45,0, 
                     150,150,232,0, 
                     0, 0, 0, 0, 
                     0, 12, 0, 0,
                     345, 12, 0, 0]})

Code:

import pandas as pd
import numpy as np


telemetry_days['failure_comp1'] = telemetry_days['failure_comp1'].replace(0, np.nan)
telemetry_days['failure_comp1'] = telemetry_days.groupby('machineID', as_index=False)['failure_comp1'].ffill().fillna(365)

If you want to use the .loc or .iloc:

Here's how I would do it. I would loop through each unique machineID, filter the dataframe to get just those machines, then iterrate through that sub-group. I also would not hard code the i (index). .iteritems() and or iterrows() will returns the index value for you, so just use that.

for machines in telemetry_days['machineID'].unique():
    EOL = 365
   
    for i, row in telemetry_days[telemetry_days['machineID'] == machines]['failure_comp1'].iteritems():
        
        if (row != 0):
            EOL = row
          
        elif (row == 0):
            telemetry_days['failure_comp1'].iloc[i] = EOL

推荐阅读