首页 > 解决方案 > 如何将返回的 pandas 数据框强制为视图,以便我可以执行转换?

问题描述

我有一个名为 pandas 的数据框merge,如下所示:

filepath                        timestamp  label_x  label_y X   Y   W   H
S6/N11/N11_R1/S6_N11_R1_IMAG0274    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0275    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0276    -----   empty   NaN NaN NaN NaN NaN
S6/N11/N11_R1/S6_N11_R1_IMAG0277    -----   empty   NaN NaN NaN NaN NaN

缺少一些时间戳,我想从图像元数据中获取时间戳(位置由filepath列指示)。如您所见, filepath 包含以 name 开头的文件夹S6。这些文件夹应该是 from S1toS6但目前我只有文件夹S1S2. 我想切出这些文件夹并执行转换:

import PIL.Image
def transformation(row):
    try:
        img=PIL.Image.open(path0/row["filepath"])
        row["timestamp"]=img._getexif()[306]
        return row
    except:
        return 
merge[ (merge["timestamp"]=='-----')& (merge["filepath"].str.startswith("S1") | merge["filepath"].str.startswith("S2")) ].apply(transformation, axis=1)

但这不起作用,因为切片操作从根本上返回了一个副本:

>>>merge[(merge["timestamp"]=='-----')& (merge["filepath"].str.startswith("S1") | merge["filepath"].str.startswith("S2")) ]._is_view
>>>False

如何更改熊猫的行为以获取视图?

标签: pythonpandasdataframe

解决方案


您可以应用您的函数并使用更新,但您需要在函数中返回一个系列:

# sample df
# df = pd.read_clipboard()
# df.iloc[0:1, 1] = 'some value'

                           filepath   timestamp label_x  label_y   X   Y   W  \
0  S6/N11/N11_R1/S6_N11_R1_IMAG0274  some value   empty      NaN NaN NaN NaN   
1  S6/N11/N11_R1/S6_N11_R1_IMAG0275       -----   empty      NaN NaN NaN NaN   
2  S6/N11/N11_R1/S6_N11_R1_IMAG0276       -----   empty      NaN NaN NaN NaN   
3  S6/N11/N11_R1/S6_N11_R1_IMAG0277       -----   empty      NaN NaN NaN NaN   

    H  
0 NaN  
1 NaN  
2 NaN  
3 NaN  

现在使用updateapplyloc

# your function
def myFunc(row):
    row['timestamp'] = 'some new value' # set new value to timestamp
    return row['timestamp'] # return a series

# use update and apply your function 
df['timestamp'].update(df.loc[2:3].apply(myFunc, axis=1))
# you would change df.loc[2:3] to your boolean
# df.loc[((df["timestamp"]=='-----') & (df['filepath'].str.startswith('S1') | df['filepath'].str.startswith('S2')))]

出去

                           filepath       timestamp label_x  label_y   X   Y  \
0  S6/N11/N11_R1/S6_N11_R1_IMAG0274      some value   empty      NaN NaN NaN   
1  S6/N11/N11_R1/S6_N11_R1_IMAG0275           -----   empty      NaN NaN NaN   
2  S6/N11/N11_R1/S6_N11_R1_IMAG0276  some new value   empty      NaN NaN NaN   
3  S6/N11/N11_R1/S6_N11_R1_IMAG0277  some new value   empty      NaN NaN NaN   

    W   H  
0 NaN NaN  
1 NaN NaN  
2 NaN NaN  
3 NaN NaN  

推荐阅读